From: Alan Lawrence <alan.lawrence@arm.com>
To: Segher Boessenkool <segher@kernel.crashing.org>
Cc: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>,
Richard Biener <rguenther@suse.de>
Subject: Re: [PATCH 2/14][Vectorizer] Make REDUC_xxx_EXPR tree codes produce a scalar result
Date: Wed, 24 Sep 2014 15:02:00 -0000 [thread overview]
Message-ID: <5422DCF3.9080206@arm.com> (raw)
In-Reply-To: <CAFiYyc0-GKObAWjnoA8ovz_QzfEk3E2nA=RKaUVt_3yYToqTnQ@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 3862 bytes --]
So it looks like patches 1-6 (reduc_foo) are relatively close to final, and
given these fix PR/61114, I'm gonna try to land these while working on a respin
of the second half (vec_shr)...(summary: yes I like the vec_perm idea too, but
the devil is in the detail!)
However my CompileFarm account is still pending, so to that end, if you were
able to test patch 2/14 (attached inc. Richie's s/VIEW_CONVERT_EXPR/NOP_EXPR/)
on the CompileFarm PowerPC machine, that'd be great, many thanks indeed. It
should apply on its own without patch 1. I'll aim to get an alternative patch 3
back to the list shortly, and follow up with .md updates to the various backends.
Cheers, Alan
Richard Biener wrote:
> On Thu, Sep 18, 2014 at 1:50 PM, Alan Lawrence <alan.lawrence@arm.com> wrote:
>> This fixes PR/61114 by redefining the REDUC_{MIN,MAX,PLUS}_EXPR tree codes.
>>
>> These are presently documented as producing a vector with the result in
>> element 0, and this is inconsistent with their use in tree-vect-loop.c
>> (which on bigendian targets pulls the bits out of the wrong end of the
>> vector result). This leads to bugs on bigendian targets - see also
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61114.
>>
>> I discounted "fixing" the vectorizer (to read from element 0) and then
>> making bigendian targets (whose architectural insn produces the result in
>> lane N-1) permute the result vector, as optimization of vectors in RTL seems
>> unlikely to remove such a permute and would lead to a performance
>> regression.
>>
>> Instead it seems more natural for the tree code to produce a scalar result
>> (producing a vector with the result in lane 0 has already caused confusion,
>> e.g. https://gcc.gnu.org/ml/gcc-patches/2012-10/msg01100.html).
>>
>> However, this patch preserves the meaning of the optab (producing a result
>> in lane 0 on little-endian architectures or N-1 on bigendian), thus
>> generally avoiding the need to change backends. Thus, expr.c extracts an
>> endianness-dependent element from the optab result to give the result
>> expected for the tree code.
>>
>> Previously posted as an RFC
>> https://gcc.gnu.org/ml/gcc-patches/2014-08/msg00041.html , now with an extra
>> VIEW_CONVERT_EXPR if the types of the reduction/result do not match.
>
> Huh. Does that ever happen? Please use a NOP_EXPR instead of
> a VIEW_CONVERT_EXPR.
>
> Ok with that change.
>
> Thanks,
> Richard.
>
>> Testing:
>> x86_86-none-linux-gnu: bootstrap, check-gcc, check-g++
>> aarch64-none-linux-gnu: bootstrap
>> aarch64-none-elf: check-gcc, check-g++
>> arm-none-eabi: check-gcc
>>
>> aarch64_be-none-elf: check-gcc, showing
>> FAIL->PASS: gcc.dg/vect/no-scevccp-outer-7.c execution test
>> FAIL->PASS: gcc.dg/vect/no-scevccp-outer-13.c execution test
>> Passes the (previously-failing) reduced testcase on
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61114
>>
>> Have also assembler/stage-1 tested that testcase on PowerPC, also
>> fixed.
>
>> gcc/ChangeLog:
>>
>> * expr.c (expand_expr_real_2): For REDUC_{MIN,MAX,PLUS}_EXPR, add
>> extract_bit_field around optab result.
>>
>> * fold-const.c (fold_unary_loc): For REDUC_{MIN,MAX,PLUS}_EXPR,
>> produce
>> scalar not vector.
>>
>> * tree-cfg.c (verify_gimple_assign_unary): Check result vs operand
>> type
>> for REDUC_{MIN,MAX,PLUS}_EXPR.
>>
>> * tree-vect-loop.c (vect_analyze_loop): Update comment.
>> (vect_create_epilog_for_reduction): For direct vector reduction, use
>> result of tree code directly without extract_bit_field.
>>
>> * tree.def (REDUC_MAX_EXPR, REDUC_MIN_EXPR, REDUC_PLUS_EXPR): Update
>> comment.
>
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 2_make_reduc_tree_codes_scalar.patch --]
[-- Type: text/x-patch; name=2_make_reduc_tree_codes_scalar.patch, Size: 6705 bytes --]
commit a7b173d5efc6f08589b04fffeec9b3942b6282a0
Author: Alan Lawrence <alan.lawrence@arm.com>
Date: Tue Jul 29 11:46:01 2014 +0100
Make tree codes produce scalar, with NOP_EXPRs. (tree-vect-loop.c mess)
diff --git a/gcc/expr.c b/gcc/expr.c
index a6233f3..c792028 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -9044,7 +9044,17 @@ expand_expr_real_2 (sepops ops, rtx target, enum machine_mode tmode,
{
op0 = expand_normal (treeop0);
this_optab = optab_for_tree_code (code, type, optab_default);
- temp = expand_unop (mode, this_optab, op0, target, unsignedp);
+ enum machine_mode vec_mode = TYPE_MODE (TREE_TYPE (treeop0));
+ temp = expand_unop (vec_mode, this_optab, op0, NULL_RTX, unsignedp);
+ gcc_assert (temp);
+ /* The tree code produces a scalar result, but (somewhat by convention)
+ the optab produces a vector with the result in element 0 if
+ little-endian, or element N-1 if big-endian. So pull the scalar
+ result out of that element. */
+ int index = BYTES_BIG_ENDIAN ? GET_MODE_NUNITS (vec_mode) - 1 : 0;
+ int bitsize = GET_MODE_BITSIZE (GET_MODE_INNER (vec_mode));
+ temp = extract_bit_field (temp, bitsize, bitsize * index, unsignedp,
+ target, mode, mode);
gcc_assert (temp);
return temp;
}
diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index d1b59a1..1773585 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -8246,12 +8246,13 @@ fold_unary_loc (location_t loc, enum tree_code code, tree type, tree op0)
case REDUC_MAX_EXPR:
case REDUC_PLUS_EXPR:
{
- unsigned int nelts = TYPE_VECTOR_SUBPARTS (type), i;
+ unsigned int nelts, i;
tree *elts;
enum tree_code subcode;
if (TREE_CODE (op0) != VECTOR_CST)
return NULL_TREE;
+ nelts = TYPE_VECTOR_SUBPARTS (TREE_TYPE (op0));
elts = XALLOCAVEC (tree, nelts);
if (!vec_cst_ctor_to_array (op0, elts))
@@ -8270,10 +8271,9 @@ fold_unary_loc (location_t loc, enum tree_code code, tree type, tree op0)
elts[0] = const_binop (subcode, elts[0], elts[i]);
if (elts[0] == NULL_TREE || !CONSTANT_CLASS_P (elts[0]))
return NULL_TREE;
- elts[i] = build_zero_cst (TREE_TYPE (type));
}
- return build_vector (type, elts);
+ return elts[0];
}
default:
diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index e89d76a..6c6ff18 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -3539,12 +3539,21 @@ verify_gimple_assign_unary (gimple stmt)
return false;
}
-
- case VEC_UNPACK_HI_EXPR:
- case VEC_UNPACK_LO_EXPR:
case REDUC_MAX_EXPR:
case REDUC_MIN_EXPR:
case REDUC_PLUS_EXPR:
+ if (!VECTOR_TYPE_P (rhs1_type)
+ || !useless_type_conversion_p (lhs_type, TREE_TYPE (rhs1_type)))
+ {
+ error ("reduction should convert from vector to element type");
+ debug_generic_expr (lhs_type);
+ debug_generic_expr (rhs1_type);
+ return true;
+ }
+ return false;
+
+ case VEC_UNPACK_HI_EXPR:
+ case VEC_UNPACK_LO_EXPR:
case VEC_UNPACK_FLOAT_HI_EXPR:
case VEC_UNPACK_FLOAT_LO_EXPR:
/* FIXME. */
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index fd1166f..8d97e17 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -1892,9 +1892,9 @@ vect_analyze_loop (struct loop *loop)
Output:
REDUC_CODE - the corresponding tree-code to be used to reduce the
- vector of partial results into a single scalar result (which
- will also reside in a vector) or ERROR_MARK if the operation is
- a supported reduction operation, but does not have such tree-code.
+ vector of partial results into a single scalar result, or ERROR_MARK
+ if the operation is a supported reduction operation, but does not have
+ such a tree-code.
Return FALSE if CODE currently cannot be vectorized as reduction. */
@@ -4168,6 +4168,7 @@ vect_create_epilog_for_reduction (vec<tree> vect_defs, gimple stmt,
if (reduc_code != ERROR_MARK && !slp_reduc)
{
tree tmp;
+ tree vec_elem_type;
/*** Case 1: Create:
v_out2 = reduc_expr <v_out1> */
@@ -4176,14 +4177,26 @@ vect_create_epilog_for_reduction (vec<tree> vect_defs, gimple stmt,
dump_printf_loc (MSG_NOTE, vect_location,
"Reduce using direct vector reduction.\n");
- vec_dest = vect_create_destination_var (scalar_dest, vectype);
- tmp = build1 (reduc_code, vectype, new_phi_result);
- epilog_stmt = gimple_build_assign (vec_dest, tmp);
- new_temp = make_ssa_name (vec_dest, epilog_stmt);
+ vec_elem_type = TREE_TYPE (TREE_TYPE (new_phi_result));
+ if (!useless_type_conversion_p (scalar_type, vec_elem_type))
+ {
+ tree tmp_dest =
+ vect_create_destination_var (scalar_dest, vec_elem_type);
+ tmp = build1 (reduc_code, vec_elem_type, new_phi_result);
+ epilog_stmt = gimple_build_assign (tmp_dest, tmp);
+ new_temp = make_ssa_name (tmp_dest, epilog_stmt);
+ gimple_assign_set_lhs (epilog_stmt, new_temp);
+ gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
+
+ tmp = build1 (NOP_EXPR, scalar_type, new_temp);
+ }
+ else
+ tmp = build1 (reduc_code, scalar_type, new_phi_result);
+ epilog_stmt = gimple_build_assign (new_scalar_dest, tmp);
+ new_temp = make_ssa_name (new_scalar_dest, epilog_stmt);
gimple_assign_set_lhs (epilog_stmt, new_temp);
gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
-
- extract_scalar_result = true;
+ scalar_results.safe_push (new_temp);
}
else
{
diff --git a/gcc/tree.def b/gcc/tree.def
index bd39e4b..c830e4b 100644
--- a/gcc/tree.def
+++ b/gcc/tree.def
@@ -1161,10 +1161,9 @@ DEFTREECODE (TRANSACTION_EXPR, "transaction_expr", tcc_expression, 1)
result (e.g. summing the elements of the vector, finding the minimum over
the vector elements, etc).
Operand 0 is a vector.
- The expression returns a vector of the same type, with the first
- element in the vector holding the result of the reduction of all elements
- of the operand. The content of the other elements in the returned vector
- is undefined. */
+ The expression returns a scalar, with type the same as the elements of the
+ vector, holding the result of the reduction of all elements of the operand.
+ */
DEFTREECODE (REDUC_MAX_EXPR, "reduc_max_expr", tcc_unary, 1)
DEFTREECODE (REDUC_MIN_EXPR, "reduc_min_expr", tcc_unary, 1)
DEFTREECODE (REDUC_PLUS_EXPR, "reduc_plus_expr", tcc_unary, 1)
next prev parent reply other threads:[~2014-09-24 15:02 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-09-18 11:41 [PATCH 0/14+2][Vectorizer] Made reductions endianness-neutral, fixes PR/61114 Alan Lawrence
2014-09-18 11:45 ` [PATCH 1/14][AArch64] Temporarily remove aarch64_gimple_fold_builtin code for reduction operations Alan Lawrence
2014-09-24 9:41 ` Marcus Shawcroft
2014-09-18 11:51 ` [PATCH 2/14][Vectorizer] Make REDUC_xxx_EXPR tree codes produce a scalar result Alan Lawrence
2014-09-22 10:34 ` Richard Biener
2014-09-22 13:23 ` Alan Lawrence
2014-09-24 15:02 ` Alan Lawrence [this message]
2014-09-24 18:08 ` Segher Boessenkool
2014-09-25 16:07 ` Alan Lawrence
2014-09-18 11:54 ` [PATCH 3/14] Add new optabs for reducing vectors to scalars Alan Lawrence
2014-09-22 10:40 ` Richard Biener
2014-09-22 13:26 ` Alan Lawrence
2014-09-22 13:38 ` Richard Biener
2014-09-25 14:33 ` [PATCH/RFC v2 " Alan Lawrence
2014-09-25 15:31 ` Richard Biener
2014-09-25 16:12 ` Alan Lawrence
2014-09-25 19:20 ` Segher Boessenkool
2014-09-18 11:59 ` [PATCH 4/14][AArch64] Use new reduc_plus_scal optabs, inc. for __builtins Alan Lawrence
2014-09-24 9:44 ` Marcus Shawcroft
2014-09-18 12:02 ` [PATCH 5/14][AArch64] Use new reduc_[us](min|max)_scal optabs, inc. for builtins Alan Lawrence
2014-09-24 9:47 ` Marcus Shawcroft
2014-09-18 12:05 ` [PATCH 6/14][AArch64] Restore gimple_folding of reduction intrinsics Alan Lawrence
2014-09-24 9:48 ` Marcus Shawcroft
2014-09-18 12:19 ` [PATCH 7/14][Testsuite] Add tests of reductions using whole-vector-shifts (multiplication) Alan Lawrence
2014-09-22 10:41 ` Richard Biener
2014-09-18 12:25 ` [PATCH 8/14][Testsuite] Add tests of reductions using whole-vector-shifts (ior) Alan Lawrence
2014-09-22 10:42 ` Richard Biener
2014-09-18 12:27 ` [PATCH 9/14] Enforce whole-vector-shifts to always be by a whole number of elements Alan Lawrence
2014-09-22 10:50 ` Richard Biener
2014-09-18 12:34 ` [PATCH 10/14][AArch64] Implement vec_shr optab Alan Lawrence
2014-09-18 12:35 ` [PATCH 11/14] Remove VEC_LSHIFT_EXPR and vec_shl_optab Alan Lawrence
2014-09-22 10:52 ` Richard Biener
2014-10-27 18:45 ` Alan Lawrence
2014-10-27 20:24 ` Richard Biener
2014-09-18 12:43 ` [PATCH 12/14][Vectorizer] Redefine VEC_RSHIFT_EXPR and vec_shr_optab as endianness-neutral Alan Lawrence
2014-09-18 13:12 ` David Edelsohn
2014-09-22 13:27 ` Bill Schmidt
2014-09-22 10:58 ` Richard Biener
2014-09-18 12:45 ` [PATCH 13/14][AArch64_be] Fix vec_shr pattern to correctly implement endianness-neutral optab Alan Lawrence
2014-09-22 10:52 ` Richard Biener
2014-09-18 12:48 ` [PATCH 14/14][Vectorizer] Tidy up vect_create_epilog / use_scalar_result Alan Lawrence
2014-09-22 10:53 ` Richard Biener
2014-11-14 17:29 ` PUSHED: " Alan Lawrence
2014-09-18 12:58 ` [PATCH/RFC 15 / 14+2][RS6000] Remove vec_shl and (hopefully) fix vec_shr Alan Lawrence
2014-09-23 12:50 ` David Edelsohn
2014-09-18 13:02 ` [PATCH 16 / 14+2][MIPS] " Alan Lawrence
2014-09-22 11:21 ` [PATCH 0/14+2][Vectorizer] Made reductions endianness-neutral, fixes PR/61114 Richard Biener
2014-09-22 11:26 ` Richard Biener
2014-10-06 17:31 ` Alan Lawrence
[not found] ` <5432D1A5.6080208@arm.com>
2014-10-07 7:45 ` Richard Biener
2014-10-07 7:46 ` Richard Biener
[not found] ` <5436C138.50208@arm.com>
2014-10-09 17:13 ` Alan Lawrence
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5422DCF3.9080206@arm.com \
--to=alan.lawrence@arm.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=rguenther@suse.de \
--cc=segher@kernel.crashing.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).