public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [RFC][PATCH, vec-tails 00/10] Support vectorization of loop epilogues
@ 2016-05-19 19:36 Ilya Enkovich
  2016-06-15 12:06 ` Richard Biener
  2016-06-16  6:28 ` Jeff Law
  0 siblings, 2 replies; 4+ messages in thread
From: Ilya Enkovich @ 2016-05-19 19:36 UTC (permalink / raw)
  To: gcc-patches

Hi,

This series is an extension of previous work on loop epilogue combining [1].

It introduces three ways to handle vectorized loop epilogues: combine it with
vectorized loop, vectorize it with masks, vectorize it using a smaller vector
size.

Also it supports vectorization of loops with low trip count.

Epilogue combining is used as a basic masking transformation.  Epilogue
masking and low trip count loop vectorization is considered as epilogue
combining with a zero trip count vector loop.

Epilogues vectorization is controlled via new option -ftree-vectorize-epilogues=
which gets a comma separated list of enabled modes which include combine, mask,
nomask.  There is a separate option -ftree-vectorize-short-loops for low trip
count loops.

To support epilogues vectorization I use a queue of loops to be vectorized in
vectorize_loops and change vect_transform_loop to return generated epilogue
(in case we want to try vectorize it).  If epilogue is returned then it is
queued for processing.  This variant of epilogues processing was chosen because
it is simple and works for all epilogue processing options.

There are currently some limitations implied by this scheme:
 - Copied loop misses some required optimization info (e.g. scev info)
which may result in an epilogue which cannot be vectorized
 - Loop epilogue may require if-convertion
 - Alias/alignment checks are not inherited and therefore will be performed
one more time for epilogue.  For now epilogue vectorization is just disabled
in case alias versioning is required and alignment enhancement is
disabled for epilogues.

There is a set of new fields added to _loop_vec_info to support epilogues
vectorization.

LOOP_VINFO_CAN_BE_MASKED - true if vectorized loop can be masked.  It is
computed during vectorization analysis (in various vectorizable_* functions).

LOOP_VINFO_REQUIRED_MASKS - for loop which can be masked it holds all masks
required to mask the loop.

LOOP_VINFO_COMBINE_EPILOGUE - true if we decided vectorized loop should be
masked.

LOOP_VINFO_MASK_EPILOGUE - true if we decided an epilogue of this loop
should be vectorized and masked

LOOP_VINFO_NEED_MASKING - true if vectorized loop has to be masked (set for
epilogues we want to mask and low trip count loops).

LOOP_VINFO_ORIG_LOOP_INFO - for epilogues this holds loop_vec_info of the
original vectorized loop.

To make a decision whether we want to mask or combine a loop epilogue
cost model is extended with masking costs.  This includes vect_masking_prologue
and vect_masking_body elements added to vect_cost_model_location enum and
finish_cost extended with two additional returned values correspondingly.  Also
in addition to add_stmt_cost I also add add_stmt_masking_cost to compute
a cost for masking a statement.

vect_estimate_min_profitable_iters checks if epilogue masking is profitable
and also computes a number of iterations required to have profitable
epilogue combining (this number may be used as a threshold in vectorized
loop guard).

These patches do not enable any of new features by default for all optimization
levels.  Masking features are expected to be mostly used for AVX-512 targets
and lack of hardware suitable for wide performance testing is the reason cost
model is not tuned and optimizations are not enabled by default.  With small
tests using a small number of loop iterations and 'heavy' epilogues (e.g.
number of iterations is VF*2-1) I see expected ~2x gain on existing KNL hardware.
Later this year we expect to get an access to KNL machines and have an
opportunity to tune masking cost model.

On Haswell hardware I don't see performance gains on similar loops which means
masked code is not better than a scalar one when we have a heavy masks usage.
This still might be useful in case number statements requiring masking is
relatively small (I used test a[i] += b[i] which needs masking for 3 out of 4
vector statements).  We will continue search for cases where masking is
profitable for Haswell to tune masking costs appropriately.

Below are ChangeLogs for whole series.

[1] https://gcc.gnu.org/ml/gcc-patches/2015-10/msg03014.html

Thanks,
Ilya
--
gcc/

2016-05-19  Ilya Enkovich  <ilya.enkovich@intel.com>

	* common.opt (flag_tree_vectorize_epilogues): New.
	(ftree-vectorize-short-loops): New.
	(ftree-vectorize-epilogues=): New.
	(fno-tree-vectorize-epilogues): New.
	(fvect-epilogue-cost-model=): New.
	* flag-types.h (enum vect_epilogue_mode): New.
	* opts.c (parse_vectorizer_options): New.
	(common_handle_option): Support -ftree-vectorize-epilogues=
	and -fno-tree-vectorize-epilogues options.


gcc/

2016-05-19  Ilya Enkovich  <ilya.enkovich@intel.com>

	* tree-vectorizer.h (struct _loop_vec_info): Add new fields
	can_be_masked, required_masks, mask_epilogue, combine_epilogue,
	need_masking, orig_loop_info.
	(LOOP_VINFO_CAN_BE_MASKED): New.
	(LOOP_VINFO_REQUIRED_MASKS): New.
	(LOOP_VINFO_COMBINE_EPILOGUE): New.
	(LOOP_VINFO_MASK_EPILOGUE): New.
	(LOOP_VINFO_NEED_MASKING): New.
	(LOOP_VINFO_ORIG_LOOP_INFO): New.
	(LOOP_VINFO_EPILOGUE_P): New.
	(LOOP_VINFO_ORIG_MASK_EPILOGUE): New.
	(LOOP_VINFO_ORIG_VECT_FACTOR): New.
	* tree-vect-loop.c (new_loop_vec_info): Initialize new
	_loop_vec_info fields.


gcc/

2016-05-19  Ilya Enkovich  <ilya.enkovich@intel.com>

	* tree-if-conv.c (tree_if_conversion): Make public.
	* tree-if-conv.h: New file.
	* tree-vect-data-refs.c (vect_enhance_data_refs_alignment): Don't
	try to enhance alignment for epilogues.
	* tree-vect-loop-manip.c (vect_do_peeling_for_loop_bound): Return
	created loop.
	* tree-vect-loop.c: include tree-if-conv.h.
	(destroy_loop_vec_info): Preserve LOOP_VINFO_ORIG_LOOP_INFO in
	loop->aux.
	(vect_analyze_loop_form): Init LOOP_VINFO_ORIG_LOOP_INFO and reset
	loop->aux.
	(vect_analyze_loop): Reset loop->aux.
	(vect_transform_loop): Check if created epilogue should be returned
	for further vectorization.  If-convert epilogue if required.
	* tree-vectorizer.c (vectorize_loops): Add a queue of loops to
	process and insert vectorized loop epilogues into this queue.
	* tree-vectorizer.h (vect_do_peeling_for_loop_bound): Return created
	loop.
	(vect_transform_loop): Return created loop.


gcc/

2016-05-19  Ilya Enkovich  <ilya.enkovich@intel.com>

	* config/i386/i386.c (ix86_init_cost): Extend costs array.
	(ix86_add_stmt_masking_cost): New.
	(ix86_finish_cost): Add masking_prologue_cost and masking_body_cost
	args.
	(TARGET_VECTORIZE_ADD_STMT_MASKING_COST): New.
	* config/i386/i386.h (TARGET_INCREASE_MASK_STORE_COST): New.
	* config/i386/x86-tune.def (X86_TUNE_INCREASE_MASK_STORE_COST): New.
	* config/rs6000/rs6000.c (_rs6000_cost_data): Extend cost array.
	(rs6000_init_cost): Initialize new cost elements.
	(rs6000_finish_cost): Add masking_prologue_cost and masking_body_cost.
	* config/spu/spu.c (spu_init_cost): Extend costs array.
	(spu_finish_cost): Add masking_prologue_cost and masking_body_cost args.
	* doc/tm.texi.in (TARGET_VECTORIZE_ADD_STMT_MASKING_COST): New.
	* doc/tm.texi: Regenerated.
	* target.def (add_stmt_masking_cost): New.
	(finish_cost): Add masking_prologue_cost and masking_body_cost args.
	* target.h (enum vect_cost_for_stmt): Add vector_mask_load and
	vector_mask_store.
	(enum vect_cost_model_location): Add vect_masking_prologue
	and vect_masking_body.
	* targhooks.c (default_builtin_vectorization_cost): Support
	vector_mask_load and vector_mask_store.
	(default_init_cost): Extend costs array.
	(default_add_stmt_masking_cost): New.
	(default_finish_cost): Add masking_prologue_cost and masking_body_cost
	args.
	* targhooks.h (default_add_stmt_masking_cost): New.
	* tree-vect-loop.c (vect_estimate_min_profitable_iters): Adjust
	finish_cost call.
	* tree-vect-slp.c (vect_bb_vectorization_profitable_p): Likewise.
	* tree-vectorizer.h (add_stmt_masking_cost): New.
	(finish_cost): Add masking_prologue_cost and masking_body_cost args.


gcc/

2016-05-19  Ilya Enkovich  <ilya.enkovich@intel.com>

	* tree-vect-loop.c: Include insn-config.h and recog.h.
	(vect_check_required_masks_widening): New.
	(vect_check_required_masks_narrowing): New.
	(vect_get_masking_iv_elems): New.
	(vect_get_masking_iv_type): New.
	(vect_get_extreme_masks): New.
	(vect_check_required_masks): New.
	(vect_analyze_loop_operations): Add vect_check_required_masks
	call to compute LOOP_VINFO_CAN_BE_MASKED.
	(vect_analyze_loop_2): Initialize LOOP_VINFO_CAN_BE_MASKED and
	LOOP_VINFO_NEED_MASKING before starting over.
	(vectorizable_reduction): Compute LOOP_VINFO_CAN_BE_MASKED and
	masking cost.
	* tree-vect-stmts.c (can_mask_load_store): New.
	(vect_model_load_masking_cost): New.
	(vect_model_store_masking_cost): New.
	(vect_model_simple_masking_cost): New.
	(vectorizable_mask_load_store): Compute LOOP_VINFO_CAN_BE_MASKED
	and masking cost.
	(vectorizable_simd_clone_call): Likewise.
	(vectorizable_store): Likewise.
	(vectorizable_load): Likewise.
	(vect_stmt_should_be_masked_for_epilogue): New.
	(vect_add_required_mask_for_stmt): New.
	(vect_analyze_stmt): Compute LOOP_VINFO_CAN_BE_MASKED.
	* tree-vectorizer.h (vect_model_load_masking_cost): New.
	(vect_model_store_masking_cost): New.
	(vect_model_simple_masking_cost): New.


gcc/

2016-05-19  Ilya Enkovich  <ilya.enkovich@intel.com>

	* tree-vect-stmts.c (vectorizable_mask_load_store): Mark
	the first copy of generated vector stores.
	(vectorizable_store): Mark the first copy of generated
	vector stores and provide it with vectype and the original
	data reference.
	* tree-vectorizer.h (struct _stmt_vec_info): Add first_copy_p
	field.
	(STMT_VINFO_FIRST_COPY_P): New.


gcc/

2016-05-19  Ilya Enkovich  <ilya.enkovich@intel.com>

	* dbgcnt.def (vect_tail_combine): New.
	* params.def (PARAM_VECT_COST_INCREASE_COMBINE_THRESHOLD): New.
	* tree-vect-data-refs.c (vect_get_new_ssa_name): Support vect_mask_var.
	* tree-vect-loop-manip.c (slpeel_tree_peel_loop_to_edge): Support
	epilogue combined with loop body.
	(vect_do_peeling_for_loop_bound): LIkewise.
	(vect_do_peeling_for_alignment): ???
	* tree-vect-loop.c Include alias.h and dbgcnt.h.
	(vect_estimate_min_profitable_iters): Add ret_min_profitable_combine_niters
	arg, compute number of iterations for which loop epilogue combining is
	profitable.
	(vect_generate_tmps_on_preheader): Support combined apilogue.
	(vect_gen_ivs_for_masking): New.
	(vect_get_mask_index_for_elems): New.
	(vect_get_mask_index_for_type): New.
	(vect_gen_loop_masks): New.
	(vect_mask_reduction_stmt): New.
	(vect_mask_mask_load_store_stmt): New.
	(vect_mask_load_store_stmt): New.
	(vect_combine_loop_epilogue): New.
	(vect_transform_loop): Support combined apilogue.


gcc/

2016-05-19  Ilya Enkovich  <ilya.enkovich@intel.com>

	* dbgcnt.def (vect_tail_mask): New.
	* tree-vect-loop.c (vect_analyze_loop_2): Support masked loop
	epilogues and low trip count loops.
	(vect_get_known_peeling_cost): Ignore scalat epilogue cost for
	loops we are going to mask.
	(vect_estimate_min_profitable_iters): Support masked loop
	epilogues and low trip count loops.
	* tree-vectorizer.c (vectorize_loops): Add a message for a case
	when loop epilogue can't be vectorized.


gcc/

2016-05-19  Ilya Enkovich  <ilya.enkovich@intel.com>

	* tree-vect-loop.c (vect_transform_loop): Print more info
	about vectorized loop and specify used vector size.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC][PATCH, vec-tails 00/10] Support vectorization of loop epilogues
  2016-05-19 19:36 [RFC][PATCH, vec-tails 00/10] Support vectorization of loop epilogues Ilya Enkovich
@ 2016-06-15 12:06 ` Richard Biener
  2016-06-16  4:14   ` Jeff Law
  2016-06-16  6:28 ` Jeff Law
  1 sibling, 1 reply; 4+ messages in thread
From: Richard Biener @ 2016-06-15 12:06 UTC (permalink / raw)
  To: Ilya Enkovich; +Cc: GCC Patches

On Thu, May 19, 2016 at 9:35 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
> Hi,
>
> This series is an extension of previous work on loop epilogue combining [1].
>
> It introduces three ways to handle vectorized loop epilogues: combine it with
> vectorized loop, vectorize it with masks, vectorize it using a smaller vector
> size.
>
> Also it supports vectorization of loops with low trip count.
>
> Epilogue combining is used as a basic masking transformation.  Epilogue
> masking and low trip count loop vectorization is considered as epilogue
> combining with a zero trip count vector loop.
>
> Epilogues vectorization is controlled via new option -ftree-vectorize-epilogues=
> which gets a comma separated list of enabled modes which include combine, mask,
> nomask.  There is a separate option -ftree-vectorize-short-loops for low trip
> count loops.
>
> To support epilogues vectorization I use a queue of loops to be vectorized in
> vectorize_loops and change vect_transform_loop to return generated epilogue
> (in case we want to try vectorize it).  If epilogue is returned then it is
> queued for processing.  This variant of epilogues processing was chosen because
> it is simple and works for all epilogue processing options.
>
> There are currently some limitations implied by this scheme:
>  - Copied loop misses some required optimization info (e.g. scev info)
> which may result in an epilogue which cannot be vectorized
>  - Loop epilogue may require if-convertion
>  - Alias/alignment checks are not inherited and therefore will be performed
> one more time for epilogue.  For now epilogue vectorization is just disabled
> in case alias versioning is required and alignment enhancement is
> disabled for epilogues.
>
> There is a set of new fields added to _loop_vec_info to support epilogues
> vectorization.
>
> LOOP_VINFO_CAN_BE_MASKED - true if vectorized loop can be masked.  It is
> computed during vectorization analysis (in various vectorizable_* functions).
>
> LOOP_VINFO_REQUIRED_MASKS - for loop which can be masked it holds all masks
> required to mask the loop.
>
> LOOP_VINFO_COMBINE_EPILOGUE - true if we decided vectorized loop should be
> masked.
>
> LOOP_VINFO_MASK_EPILOGUE - true if we decided an epilogue of this loop
> should be vectorized and masked
>
> LOOP_VINFO_NEED_MASKING - true if vectorized loop has to be masked (set for
> epilogues we want to mask and low trip count loops).
>
> LOOP_VINFO_ORIG_LOOP_INFO - for epilogues this holds loop_vec_info of the
> original vectorized loop.
>
> To make a decision whether we want to mask or combine a loop epilogue
> cost model is extended with masking costs.  This includes vect_masking_prologue
> and vect_masking_body elements added to vect_cost_model_location enum and
> finish_cost extended with two additional returned values correspondingly.  Also
> in addition to add_stmt_cost I also add add_stmt_masking_cost to compute
> a cost for masking a statement.
>
> vect_estimate_min_profitable_iters checks if epilogue masking is profitable
> and also computes a number of iterations required to have profitable
> epilogue combining (this number may be used as a threshold in vectorized
> loop guard).
>
> These patches do not enable any of new features by default for all optimization
> levels.  Masking features are expected to be mostly used for AVX-512 targets
> and lack of hardware suitable for wide performance testing is the reason cost
> model is not tuned and optimizations are not enabled by default.  With small
> tests using a small number of loop iterations and 'heavy' epilogues (e.g.
> number of iterations is VF*2-1) I see expected ~2x gain on existing KNL hardware.
> Later this year we expect to get an access to KNL machines and have an
> opportunity to tune masking cost model.
>
> On Haswell hardware I don't see performance gains on similar loops which means
> masked code is not better than a scalar one when we have a heavy masks usage.
> This still might be useful in case number statements requiring masking is
> relatively small (I used test a[i] += b[i] which needs masking for 3 out of 4
> vector statements).  We will continue search for cases where masking is
> profitable for Haswell to tune masking costs appropriately.

So I've gone over the patches and gave mostly high-level comments.
The vectorizer
is already in somewhat messy (aka not easy to follow) state, this
series doesn't improve
the situation (heh).  Esp. the high-level structure for code
generation and its documentation
needs work (where we do versioning / peeling and how we use the copies
in which condition
and where, etc).

Now - given my question on the profitability code for vectorized body
masking I wonder
if vectorized body masking shouldn't be better done via adding another
version for
low tripcount loops (not < vf but say < vf * N with N determined by a
cost model).
Otherwise I can't see how we'd ever mask the vectorized body for loops with
an parametric number of iterations (most loops in real life).

Thanks,
Richard.

> Below are ChangeLogs for whole series.
>
> [1] https://gcc.gnu.org/ml/gcc-patches/2015-10/msg03014.html
>
> Thanks,
> Ilya
> --
> gcc/
>
> 2016-05-19  Ilya Enkovich  <ilya.enkovich@intel.com>
>
>         * common.opt (flag_tree_vectorize_epilogues): New.
>         (ftree-vectorize-short-loops): New.
>         (ftree-vectorize-epilogues=): New.
>         (fno-tree-vectorize-epilogues): New.
>         (fvect-epilogue-cost-model=): New.
>         * flag-types.h (enum vect_epilogue_mode): New.
>         * opts.c (parse_vectorizer_options): New.
>         (common_handle_option): Support -ftree-vectorize-epilogues=
>         and -fno-tree-vectorize-epilogues options.
>
>
> gcc/
>
> 2016-05-19  Ilya Enkovich  <ilya.enkovich@intel.com>
>
>         * tree-vectorizer.h (struct _loop_vec_info): Add new fields
>         can_be_masked, required_masks, mask_epilogue, combine_epilogue,
>         need_masking, orig_loop_info.
>         (LOOP_VINFO_CAN_BE_MASKED): New.
>         (LOOP_VINFO_REQUIRED_MASKS): New.
>         (LOOP_VINFO_COMBINE_EPILOGUE): New.
>         (LOOP_VINFO_MASK_EPILOGUE): New.
>         (LOOP_VINFO_NEED_MASKING): New.
>         (LOOP_VINFO_ORIG_LOOP_INFO): New.
>         (LOOP_VINFO_EPILOGUE_P): New.
>         (LOOP_VINFO_ORIG_MASK_EPILOGUE): New.
>         (LOOP_VINFO_ORIG_VECT_FACTOR): New.
>         * tree-vect-loop.c (new_loop_vec_info): Initialize new
>         _loop_vec_info fields.
>
>
> gcc/
>
> 2016-05-19  Ilya Enkovich  <ilya.enkovich@intel.com>
>
>         * tree-if-conv.c (tree_if_conversion): Make public.
>         * tree-if-conv.h: New file.
>         * tree-vect-data-refs.c (vect_enhance_data_refs_alignment): Don't
>         try to enhance alignment for epilogues.
>         * tree-vect-loop-manip.c (vect_do_peeling_for_loop_bound): Return
>         created loop.
>         * tree-vect-loop.c: include tree-if-conv.h.
>         (destroy_loop_vec_info): Preserve LOOP_VINFO_ORIG_LOOP_INFO in
>         loop->aux.
>         (vect_analyze_loop_form): Init LOOP_VINFO_ORIG_LOOP_INFO and reset
>         loop->aux.
>         (vect_analyze_loop): Reset loop->aux.
>         (vect_transform_loop): Check if created epilogue should be returned
>         for further vectorization.  If-convert epilogue if required.
>         * tree-vectorizer.c (vectorize_loops): Add a queue of loops to
>         process and insert vectorized loop epilogues into this queue.
>         * tree-vectorizer.h (vect_do_peeling_for_loop_bound): Return created
>         loop.
>         (vect_transform_loop): Return created loop.
>
>
> gcc/
>
> 2016-05-19  Ilya Enkovich  <ilya.enkovich@intel.com>
>
>         * config/i386/i386.c (ix86_init_cost): Extend costs array.
>         (ix86_add_stmt_masking_cost): New.
>         (ix86_finish_cost): Add masking_prologue_cost and masking_body_cost
>         args.
>         (TARGET_VECTORIZE_ADD_STMT_MASKING_COST): New.
>         * config/i386/i386.h (TARGET_INCREASE_MASK_STORE_COST): New.
>         * config/i386/x86-tune.def (X86_TUNE_INCREASE_MASK_STORE_COST): New.
>         * config/rs6000/rs6000.c (_rs6000_cost_data): Extend cost array.
>         (rs6000_init_cost): Initialize new cost elements.
>         (rs6000_finish_cost): Add masking_prologue_cost and masking_body_cost.
>         * config/spu/spu.c (spu_init_cost): Extend costs array.
>         (spu_finish_cost): Add masking_prologue_cost and masking_body_cost args.
>         * doc/tm.texi.in (TARGET_VECTORIZE_ADD_STMT_MASKING_COST): New.
>         * doc/tm.texi: Regenerated.
>         * target.def (add_stmt_masking_cost): New.
>         (finish_cost): Add masking_prologue_cost and masking_body_cost args.
>         * target.h (enum vect_cost_for_stmt): Add vector_mask_load and
>         vector_mask_store.
>         (enum vect_cost_model_location): Add vect_masking_prologue
>         and vect_masking_body.
>         * targhooks.c (default_builtin_vectorization_cost): Support
>         vector_mask_load and vector_mask_store.
>         (default_init_cost): Extend costs array.
>         (default_add_stmt_masking_cost): New.
>         (default_finish_cost): Add masking_prologue_cost and masking_body_cost
>         args.
>         * targhooks.h (default_add_stmt_masking_cost): New.
>         * tree-vect-loop.c (vect_estimate_min_profitable_iters): Adjust
>         finish_cost call.
>         * tree-vect-slp.c (vect_bb_vectorization_profitable_p): Likewise.
>         * tree-vectorizer.h (add_stmt_masking_cost): New.
>         (finish_cost): Add masking_prologue_cost and masking_body_cost args.
>
>
> gcc/
>
> 2016-05-19  Ilya Enkovich  <ilya.enkovich@intel.com>
>
>         * tree-vect-loop.c: Include insn-config.h and recog.h.
>         (vect_check_required_masks_widening): New.
>         (vect_check_required_masks_narrowing): New.
>         (vect_get_masking_iv_elems): New.
>         (vect_get_masking_iv_type): New.
>         (vect_get_extreme_masks): New.
>         (vect_check_required_masks): New.
>         (vect_analyze_loop_operations): Add vect_check_required_masks
>         call to compute LOOP_VINFO_CAN_BE_MASKED.
>         (vect_analyze_loop_2): Initialize LOOP_VINFO_CAN_BE_MASKED and
>         LOOP_VINFO_NEED_MASKING before starting over.
>         (vectorizable_reduction): Compute LOOP_VINFO_CAN_BE_MASKED and
>         masking cost.
>         * tree-vect-stmts.c (can_mask_load_store): New.
>         (vect_model_load_masking_cost): New.
>         (vect_model_store_masking_cost): New.
>         (vect_model_simple_masking_cost): New.
>         (vectorizable_mask_load_store): Compute LOOP_VINFO_CAN_BE_MASKED
>         and masking cost.
>         (vectorizable_simd_clone_call): Likewise.
>         (vectorizable_store): Likewise.
>         (vectorizable_load): Likewise.
>         (vect_stmt_should_be_masked_for_epilogue): New.
>         (vect_add_required_mask_for_stmt): New.
>         (vect_analyze_stmt): Compute LOOP_VINFO_CAN_BE_MASKED.
>         * tree-vectorizer.h (vect_model_load_masking_cost): New.
>         (vect_model_store_masking_cost): New.
>         (vect_model_simple_masking_cost): New.
>
>
> gcc/
>
> 2016-05-19  Ilya Enkovich  <ilya.enkovich@intel.com>
>
>         * tree-vect-stmts.c (vectorizable_mask_load_store): Mark
>         the first copy of generated vector stores.
>         (vectorizable_store): Mark the first copy of generated
>         vector stores and provide it with vectype and the original
>         data reference.
>         * tree-vectorizer.h (struct _stmt_vec_info): Add first_copy_p
>         field.
>         (STMT_VINFO_FIRST_COPY_P): New.
>
>
> gcc/
>
> 2016-05-19  Ilya Enkovich  <ilya.enkovich@intel.com>
>
>         * dbgcnt.def (vect_tail_combine): New.
>         * params.def (PARAM_VECT_COST_INCREASE_COMBINE_THRESHOLD): New.
>         * tree-vect-data-refs.c (vect_get_new_ssa_name): Support vect_mask_var.
>         * tree-vect-loop-manip.c (slpeel_tree_peel_loop_to_edge): Support
>         epilogue combined with loop body.
>         (vect_do_peeling_for_loop_bound): LIkewise.
>         (vect_do_peeling_for_alignment): ???
>         * tree-vect-loop.c Include alias.h and dbgcnt.h.
>         (vect_estimate_min_profitable_iters): Add ret_min_profitable_combine_niters
>         arg, compute number of iterations for which loop epilogue combining is
>         profitable.
>         (vect_generate_tmps_on_preheader): Support combined apilogue.
>         (vect_gen_ivs_for_masking): New.
>         (vect_get_mask_index_for_elems): New.
>         (vect_get_mask_index_for_type): New.
>         (vect_gen_loop_masks): New.
>         (vect_mask_reduction_stmt): New.
>         (vect_mask_mask_load_store_stmt): New.
>         (vect_mask_load_store_stmt): New.
>         (vect_combine_loop_epilogue): New.
>         (vect_transform_loop): Support combined apilogue.
>
>
> gcc/
>
> 2016-05-19  Ilya Enkovich  <ilya.enkovich@intel.com>
>
>         * dbgcnt.def (vect_tail_mask): New.
>         * tree-vect-loop.c (vect_analyze_loop_2): Support masked loop
>         epilogues and low trip count loops.
>         (vect_get_known_peeling_cost): Ignore scalat epilogue cost for
>         loops we are going to mask.
>         (vect_estimate_min_profitable_iters): Support masked loop
>         epilogues and low trip count loops.
>         * tree-vectorizer.c (vectorize_loops): Add a message for a case
>         when loop epilogue can't be vectorized.
>
>
> gcc/
>
> 2016-05-19  Ilya Enkovich  <ilya.enkovich@intel.com>
>
>         * tree-vect-loop.c (vect_transform_loop): Print more info
>         about vectorized loop and specify used vector size.
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC][PATCH, vec-tails 00/10] Support vectorization of loop epilogues
  2016-06-15 12:06 ` Richard Biener
@ 2016-06-16  4:14   ` Jeff Law
  0 siblings, 0 replies; 4+ messages in thread
From: Jeff Law @ 2016-06-16  4:14 UTC (permalink / raw)
  To: Richard Biener, Ilya Enkovich; +Cc: GCC Patches

On 06/15/2016 06:05 AM, Richard Biener wrote:
> So I've gone over the patches and gave mostly high-level comments.
> The vectorizer is already in somewhat messy (aka not easy to follow)
> state, this series doesn't improve the situation (heh).  Esp. the
> high-level structure for code generation and its documentation needs
> work (where we do versioning / peeling and how we use the copies in
> which condition and where, etc).
Expecting major improvements here may not be realistic.  I think the 
question we need to answer is whether or not the improvements from this 
work justify the added complexity.

In an ideal world, I think we'd probably start over on the vectorizer, 
but the infrastructure we have is what it is -- a tangled mess that is 
difficult to understand without working in it regularly.

I'm still hoping to give this stuff a high level looksie before going on 
PTO later this month.

Jeff

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC][PATCH, vec-tails 00/10] Support vectorization of loop epilogues
  2016-05-19 19:36 [RFC][PATCH, vec-tails 00/10] Support vectorization of loop epilogues Ilya Enkovich
  2016-06-15 12:06 ` Richard Biener
@ 2016-06-16  6:28 ` Jeff Law
  1 sibling, 0 replies; 4+ messages in thread
From: Jeff Law @ 2016-06-16  6:28 UTC (permalink / raw)
  To: Ilya Enkovich, gcc-patches

On 05/19/2016 01:35 PM, Ilya Enkovich wrote:
> Hi,
>
> This series is an extension of previous work on loop epilogue combining [1].
>
> It introduces three ways to handle vectorized loop epilogues: combine it with
> vectorized loop, vectorize it with masks, vectorize it using a smaller vector
> size.
>
> Also it supports vectorization of loops with low trip count.
[ ... ]
So now that I'm working through the patches the one obvious thing that 
is missing is testcases...   We should have tests for all the new 
capabilities.  It's probably advisable to have some tests for cases 
where the costing models say "don't vectorize the epilogue" as well.

Jeff

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2016-06-16  6:28 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-19 19:36 [RFC][PATCH, vec-tails 00/10] Support vectorization of loop epilogues Ilya Enkovich
2016-06-15 12:06 ` Richard Biener
2016-06-16  4:14   ` Jeff Law
2016-06-16  6:28 ` Jeff Law

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).