From: "Andre Vieira (lists)" <andre.simoesdiasvieira@arm.com>
To: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>,
Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>,
richard.sandiford@arm.com
Subject: Re: [aarch64] Implement determine_suggested_unroll_factor
Date: Thu, 31 Mar 2022 17:15:23 +0100 [thread overview]
Message-ID: <7135cc5f-fb71-c648-dc58-bf56b0618e0d@arm.com> (raw)
In-Reply-To: <mpt4k3iyvf8.fsf@arm.com>
[-- Attachment #1: Type: text/plain, Size: 15241 bytes --]
On 28/03/2022 15:59, Richard Sandiford wrote:
> "Andre Vieira (lists)" <andre.simoesdiasvieira@arm.com> writes:
>> Hi,
>>
>> Addressed all of your comments bar the pred ops one.
>>
>> Is this OK?
>>
>>
>> gcc/ChangeLog:
>>
>> * config/aarch64/aarch64.cc (aarch64_vector_costs): Define
>> determine_suggested_unroll_factor and m_nosve_pattern.
>> (determine_suggested_unroll_factor): New function.
>> (aarch64_vector_costs::add_stmt_cost): Check for a qualifying
>> pattern
>> to set m_nosve_pattern.
>> (aarch64_vector_costs::finish_costs): Use
>> determine_suggested_unroll_factor.
>> * config/aarch64/aarch64.opt (aarch64-vect-unroll-limit): New.
>>
>> On 16/03/2022 18:01, Richard Sandiford wrote:
>>> "Andre Vieira (lists)" <andre.simoesdiasvieira@arm.com> writes:
>>>> Hi,
>>>>
>>>> This patch implements the costing function
>>>> determine_suggested_unroll_factor for aarch64.
>>>> It determines the unrolling factor by dividing the number of X
>>>> operations we can do per cycle by the number of X operations in the loop
>>>> body, taking this information from the vec_ops analysis during vector
>>>> costing and the available issue_info information.
>>>> We multiply the dividend by a potential reduction_latency, to improve
>>>> our pipeline utilization if we are stalled waiting on a particular
>>>> reduction operation.
>>>>
>>>> Right now we also have a work around for vectorization choices where the
>>>> main loop uses a NEON mode and predication is available, such that if
>>>> the main loop makes use of a NEON pattern that is not directly supported
>>>> by SVE we do not unroll, as that might cause performance regressions in
>>>> cases where we would enter the original main loop's VF. As an example if
>>>> you have a loop where you could use AVG_CEIL with a V8HI mode, you would
>>>> originally get 8x NEON using AVG_CEIL followed by a 8x SVE predicated
>>>> epilogue, using other instructions. Whereas with the unrolling you would
>>>> end up with 16x AVG_CEIL NEON + 8x SVE predicated loop, thus skipping
>>>> the original 8x NEON. In the future, we could handle this differently,
>>>> by either using a different costing model for epilogues, or potentially
>>>> vectorizing more than one single epilogue.
>>>>
>>>> gcc/ChangeLog:
>>>>
>>>> * config/aarch64/aarch64.cc (aarch64_vector_costs): Define
>>>> determine_suggested_unroll_factor.
>>>> (determine_suggested_unroll_factor): New function.
>>>> (aarch64_vector_costs::finish_costs): Use
>>>> determine_suggested_unroll_factor.
>>>>
>>>> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
>>>> index b5687aab59f630920e51b742b80a540c3a56c6c8..9d3a607d378d6a2792efa7c6dece2a65c24e4521 100644
>>>> --- a/gcc/config/aarch64/aarch64.cc
>>>> +++ b/gcc/config/aarch64/aarch64.cc
>>>> @@ -15680,6 +15680,7 @@ private:
>>>> unsigned int adjust_body_cost (loop_vec_info, const aarch64_vector_costs *,
>>>> unsigned int);
>>>> bool prefer_unrolled_loop () const;
>>>> + unsigned int determine_suggested_unroll_factor ();
>>>>
>>>> /* True if we have performed one-time initialization based on the
>>>> vec_info. */
>>>> @@ -16768,6 +16769,105 @@ adjust_body_cost_sve (const aarch64_vec_op_count *ops,
>>>> return sve_cycles_per_iter;
>>>> }
>>>>
>>>> +unsigned int
>>>> +aarch64_vector_costs::determine_suggested_unroll_factor ()
>>>> +{
>>>> + auto *issue_info = aarch64_tune_params.vec_costs->issue_info;
>>>> + if (!issue_info)
>>>> + return 1;
>>>> + bool sve = false;
>>>> + if (aarch64_sve_mode_p (m_vinfo->vector_mode))
>>> Other code uses m_vec_flags & VEC_ANY_SVE for this.
>>>
>>>> + {
>>>> + if (!issue_info->sve)
>>>> + return 1;
>>>> + sve = true;
>>>> + }
>>>> + else
>>>> + {
>>>> + if (!issue_info->advsimd)
>>>> + return 1;
>>> The issue info should instead be taken from vec_ops.simd_issue_info ()
>>> in the loop below. It can vary for each entry.
>>>
>>>> + /* If we are trying to unroll a NEON main loop that contains patterns
>>> s/a NEON/an Advanced SIMD/
>>>
>>>> + that we do not support with SVE and we might use a predicated
>>>> + epilogue, we need to be conservative and block unrolling as this might
>>>> + lead to a less optimal loop for the first and only epilogue using the
>>>> + original loop's vectorization factor.
>>>> + TODO: Remove this constraint when we add support for multiple epilogue
>>>> + vectorization. */
>>>> + if (partial_vectors_supported_p ()
>>>> + && param_vect_partial_vector_usage != 0
>>>> + && !TARGET_SVE2)
>>>> + {
>>>> + unsigned int i;
>>>> + stmt_vec_info stmt_vinfo;
>>>> + FOR_EACH_VEC_ELT (m_vinfo->stmt_vec_infos, i, stmt_vinfo)
>>>> + {
>>>> + if (is_pattern_stmt_p (stmt_vinfo))
>>>> + {
>>>> + gimple *stmt = stmt_vinfo->stmt;
>>>> + if (is_gimple_call (stmt)
>>>> + && gimple_call_internal_p (stmt))
>>>> + {
>>>> + enum internal_fn ifn
>>>> + = gimple_call_internal_fn (stmt);
>>>> + switch (ifn)
>>>> + {
>>>> + case IFN_AVG_FLOOR:
>>>> + case IFN_AVG_CEIL:
>>>> + return 1;
>>>> + default:
>>>> + break;
>>>> + }
>>>> + }
>>>> + }
>>>> + }
>>>> + }
>>> I think we should instead record this during add_stmt_cost, like we do
>>> for the other data that this function uses.
>>>
>>> We need to drop the partial_vectors_supported_p () condition though:
>>> enabling SVE must not make Advanced SIMD code worse. So for now,
>>> Advanced SIMD-only code will have to suffer the missed unrolling
>>> opportunity too.
>>>
>>>> + }
>>>> +
>>>> + unsigned int max_unroll_factor = 1;
>>>> + aarch64_simd_vec_issue_info const *vec_issue
>>>> + = sve ? issue_info->sve : issue_info->advsimd;
>>>> + for (auto vec_ops : m_ops)
>>>> + {
>>>> + /* Limit unroll factor to 4 for now. */
>>>> + unsigned int unroll_factor = 4;
>>> Would be good to make this an aarch64-specific --param in case people
>>> want to play with different values.
>>>
>>>> + unsigned int factor
>>>> + = vec_ops.reduction_latency > 1 ? vec_ops.reduction_latency : 1;
>>>> + unsigned int temp;
>>>> +
>>>> + /* Sanity check, this should never happen. */
>>>> + if ((vec_ops.stores + vec_ops.loads + vec_ops.general_ops) == 0)
>>>> + return 1;
>>>> +
>>>> + /* Check stores. */
>>>> + if (vec_ops.stores > 0)
>>>> + {
>>>> + temp = CEIL (factor * vec_issue->stores_per_cycle,
>>>> + vec_ops.stores);
>>>> + unroll_factor = MIN (unroll_factor, temp);
>>>> + }
>>>> +
>>>> + /* Check loads. */
>>>> + if (vec_ops.loads > 0)
>>>> + {
>>>> + temp = CEIL (factor * vec_issue->loads_stores_per_cycle,
>>>> + vec_ops.loads);
>>> This should use loads + stores rather than just loads.
>>>
>>>> + unroll_factor = MIN (unroll_factor, temp);
>>>> + }
>>>> +
>>>> + /* Check general ops. */
>>>> + if (vec_ops.general_ops > 0)
>>>> + {
>>>> + temp = CEIL (factor * vec_issue->general_ops_per_cycle,
>>>> + vec_ops.general_ops);
>>>> + unroll_factor = MIN (unroll_factor, temp);
>>>> + }
>>> In principle we should also take pred_ops into account when
>>> sve_issue_info () is nonnull, while ignoring the pred_ops associated
>>> with loop predication (added by analyze_loop_vinfo). I guess we
>>> should maintain two pred_op counts.
>>>
>>> That's just a suggestion for a possible future improvement though.
>>> It doesn't need to be done as part of this patch.
>>>
>>> Thanks,
>>> Richard
>>>
>>>> + max_unroll_factor = MAX (max_unroll_factor, unroll_factor);
>>>> + }
>>>> +
>>>> + /* Make sure unroll factor is power of 2. */
>>>> + return 1 << ceil_log2 (max_unroll_factor);
>>>> +}
>>>> +
>>>> /* BODY_COST is the cost of a vector loop body. Adjust the cost as necessary
>>>> and return the new cost. */
>>>> unsigned int
>>>> @@ -16904,8 +17004,11 @@ aarch64_vector_costs::finish_cost (const vector_costs *uncast_scalar_costs)
>>>> if (loop_vinfo
>>>> && m_vec_flags
>>>> && aarch64_use_new_vector_costs_p ())
>>>> - m_costs[vect_body] = adjust_body_cost (loop_vinfo, scalar_costs,
>>>> - m_costs[vect_body]);
>>>> + {
>>>> + m_costs[vect_body] = adjust_body_cost (loop_vinfo, scalar_costs,
>>>> + m_costs[vect_body]);
>>>> + m_suggested_unroll_factor = determine_suggested_unroll_factor ();
>>>> + }
>>>>
>>>> /* Apply the heuristic described above m_stp_sequence_cost. Prefer
>>>> the scalar code in the event of a tie, since there is more chance
>> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
>> index 5da3d14dc357ba01351bca961af4f100a89665e1..3e4595889ce72ff4f1a1e603d5c4fa93be6a8ff4 100644
>> --- a/gcc/config/aarch64/aarch64.cc
>> +++ b/gcc/config/aarch64/aarch64.cc
>> @@ -15637,6 +15637,7 @@ private:
>> unsigned int adjust_body_cost (loop_vec_info, const aarch64_vector_costs *,
>> unsigned int);
>> bool prefer_unrolled_loop () const;
>> + unsigned int determine_suggested_unroll_factor ();
>>
>> /* True if we have performed one-time initialization based on the
>> vec_info. */
>> @@ -15699,6 +15700,10 @@ private:
>> or vector loop. There is one entry for each tuning option of
>> interest. */
>> auto_vec<aarch64_vec_op_count, 2> m_ops;
>> +
>> + /* This loop uses a pattern that is not supported by SVE, but is supported by
>> + Advanced SIMD and SVE2. */
>> + bool m_nosve_pattern = false;
> This is true for other things besides AVG_FLOOR/CEIL. I think we should
> just accept that this is an x264 hack ;-) and name it “m_has_avg” instead.
>
> Could you put it with the other bool field? That would lead to a nicer
> layout, although it hardly matters much here.
>
>> };
>>
>> aarch64_vector_costs::aarch64_vector_costs (vec_info *vinfo,
>> @@ -16642,6 +16647,25 @@ aarch64_vector_costs::add_stmt_cost (int count, vect_cost_for_stmt kind,
>> as one iteration of the SVE loop. */
>> if (where == vect_body && m_unrolled_advsimd_niters)
>> m_unrolled_advsimd_stmts += count * m_unrolled_advsimd_niters;
>> +
>> + if (is_pattern_stmt_p (stmt_info))
>> + {
> Not sure this is necessary. We'd want the same thing if (for whatever
> reason) we had a pre-existing IFN_AVG_FLOOR/CEIL in future.
>
>> + gimple *stmt = stmt_info->stmt;
>> + if (is_gimple_call (stmt)
>> + && gimple_call_internal_p (stmt))
>> + {
>> + enum internal_fn ifn
>> + = gimple_call_internal_fn (stmt);
>> + switch (ifn)
> Very minor nit, but might as well switch on gimple_call_internal_fn (stmt)
> directly, to avoid the awkward line break.
>
>> + {
>> + case IFN_AVG_FLOOR:
>> + case IFN_AVG_CEIL:
>> + m_nosve_pattern = true;
>> + default:
>> + break;
>> + }
>> + }
>> + }
>> }
>> return record_stmt_cost (stmt_info, where, (count * stmt_cost).ceil ());
>> }
>> @@ -16725,6 +16749,68 @@ adjust_body_cost_sve (const aarch64_vec_op_count *ops,
>> return sve_cycles_per_iter;
>> }
>>
>> +unsigned int
>> +aarch64_vector_costs::determine_suggested_unroll_factor ()
>> +{
>> + bool sve = m_vec_flags & VEC_ANY_SVE;
>> + /* If we are trying to unroll an Advanced SIMD main loop that contains
>> + patterns that we do not support with SVE and we might use a predicated
>> + epilogue, we need to be conservative and block unrolling as this might
>> + lead to a less optimal loop for the first and only epilogue using the
>> + original loop's vectorization factor.
>> + TODO: Remove this constraint when we add support for multiple epilogue
>> + vectorization. */
>> + if (!sve && TARGET_SVE2 && m_nosve_pattern)
>> + return 1;
> Shouldn't this be !TARGET_SVE2?
>
>> +
>> + unsigned int max_unroll_factor = 1;
>> + for (auto vec_ops : m_ops)
>> + {
>> + aarch64_simd_vec_issue_info const *vec_issue
>> + = vec_ops.simd_issue_info ();
>> + if (!vec_issue)
>> + return 1;
>> + /* Limit unroll factor to a value adjustable by the user, the default
>> + value is 4. */
>> + unsigned int unroll_factor = aarch64_vect_unroll_limit;
>> + unsigned int factor
>> + = vec_ops.reduction_latency > 1 ? vec_ops.reduction_latency : 1;
>> + unsigned int temp;
>> +
>> + /* Sanity check, this should never happen. */
>> + if ((vec_ops.stores + vec_ops.loads + vec_ops.general_ops) == 0)
>> + return 1;
>> +
>> + /* Check stores. */
>> + if (vec_ops.stores > 0)
>> + {
>> + temp = CEIL (factor * vec_issue->stores_per_cycle,
>> + vec_ops.stores);
>> + unroll_factor = MIN (unroll_factor, temp);
>> + }
>> +
>> + /* Check loads + stores. */
>> + if (vec_ops.loads > 0)
>> + {
>> + temp = CEIL (factor * vec_issue->loads_stores_per_cycle,
>> + vec_ops.loads + vec_ops.stores);
>> + unroll_factor = MIN (unroll_factor, temp);
>> + }
>> +
>> + /* Check general ops. */
>> + if (vec_ops.general_ops > 0)
>> + {
>> + temp = CEIL (factor * vec_issue->general_ops_per_cycle,
>> + vec_ops.general_ops);
>> + unroll_factor = MIN (unroll_factor, temp);
>> + }
>> + max_unroll_factor = MAX (max_unroll_factor, unroll_factor);
>> + }
>> +
>> + /* Make sure unroll factor is power of 2. */
>> + return 1 << ceil_log2 (max_unroll_factor);
>> +}
>> +
>> /* BODY_COST is the cost of a vector loop body. Adjust the cost as necessary
>> and return the new cost. */
>> unsigned int
>> @@ -16861,8 +16947,11 @@ aarch64_vector_costs::finish_cost (const vector_costs *uncast_scalar_costs)
>> if (loop_vinfo
>> && m_vec_flags
>> && aarch64_use_new_vector_costs_p ())
>> - m_costs[vect_body] = adjust_body_cost (loop_vinfo, scalar_costs,
>> - m_costs[vect_body]);
>> + {
>> + m_costs[vect_body] = adjust_body_cost (loop_vinfo, scalar_costs,
>> + m_costs[vect_body]);
>> + m_suggested_unroll_factor = determine_suggested_unroll_factor ();
>> + }
>>
>> /* Apply the heuristic described above m_stp_sequence_cost. Prefer
>> the scalar code in the event of a tie, since there is more chance
>> diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt
>> index 98ce9c0ab61709b29bd29f3853f025e3a5a1bef2..92220b26ee2bf9f95c9a387c3155779596ee5ad5 100644
>> --- a/gcc/config/aarch64/aarch64.opt
>> +++ b/gcc/config/aarch64/aarch64.opt
>> @@ -292,3 +292,7 @@ Constant memmove size in bytes above which to start using MOPS sequence.
>> -param=aarch64-mops-memset-size-threshold=
>> Target Joined UInteger Var(aarch64_mops_memset_size_threshold) Init(256) Param
>> Constant memset size in bytes from which to start using MOPS sequence.
>> +
>> +-param=aarch64-vect-unroll-limit=
>> +Target Joined UInteger Var(aarch64_vect_unroll_limit) Init(4) Param
>> +Limit how much the autovectorizer may unroll a loop.
> This needs to be documented in invoke.texi, alongside the existing
> AArch64-specific params.
>
> OK with those changes, thanks.
>
> Richard
Committed attached patch.
[-- Attachment #2: unroll_aarch64_3.patch --]
[-- Type: text/plain, Size: 5976 bytes --]
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 5da3d14dc357ba01351bca961af4f100a89665e1..d66626748e5edc46e46edf10e243114a9f74be97 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -15637,11 +15637,16 @@ private:
unsigned int adjust_body_cost (loop_vec_info, const aarch64_vector_costs *,
unsigned int);
bool prefer_unrolled_loop () const;
+ unsigned int determine_suggested_unroll_factor ();
/* True if we have performed one-time initialization based on the
vec_info. */
bool m_analyzed_vinfo = false;
+ /* This loop uses an average operation that is not supported by SVE, but is
+ supported by Advanced SIMD and SVE2. */
+ bool m_has_avg = false;
+
/* - If M_VEC_FLAGS is zero then we're costing the original scalar code.
- If M_VEC_FLAGS & VEC_ADVSIMD is nonzero then we're costing Advanced
SIMD code.
@@ -16642,6 +16647,21 @@ aarch64_vector_costs::add_stmt_cost (int count, vect_cost_for_stmt kind,
as one iteration of the SVE loop. */
if (where == vect_body && m_unrolled_advsimd_niters)
m_unrolled_advsimd_stmts += count * m_unrolled_advsimd_niters;
+
+ /* Detect the use of an averaging operation. */
+ gimple *stmt = stmt_info->stmt;
+ if (is_gimple_call (stmt)
+ && gimple_call_internal_p (stmt))
+ {
+ switch (gimple_call_internal_fn (stmt))
+ {
+ case IFN_AVG_FLOOR:
+ case IFN_AVG_CEIL:
+ m_has_avg = true;
+ default:
+ break;
+ }
+ }
}
return record_stmt_cost (stmt_info, where, (count * stmt_cost).ceil ());
}
@@ -16725,6 +16745,68 @@ adjust_body_cost_sve (const aarch64_vec_op_count *ops,
return sve_cycles_per_iter;
}
+unsigned int
+aarch64_vector_costs::determine_suggested_unroll_factor ()
+{
+ bool sve = m_vec_flags & VEC_ANY_SVE;
+ /* If we are trying to unroll an Advanced SIMD main loop that contains
+ an averaging operation that we do not support with SVE and we might use a
+ predicated epilogue, we need to be conservative and block unrolling as
+ this might lead to a less optimal loop for the first and only epilogue
+ using the original loop's vectorization factor.
+ TODO: Remove this constraint when we add support for multiple epilogue
+ vectorization. */
+ if (!sve && !TARGET_SVE2 && m_has_avg)
+ return 1;
+
+ unsigned int max_unroll_factor = 1;
+ for (auto vec_ops : m_ops)
+ {
+ aarch64_simd_vec_issue_info const *vec_issue
+ = vec_ops.simd_issue_info ();
+ if (!vec_issue)
+ return 1;
+ /* Limit unroll factor to a value adjustable by the user, the default
+ value is 4. */
+ unsigned int unroll_factor = aarch64_vect_unroll_limit;
+ unsigned int factor
+ = vec_ops.reduction_latency > 1 ? vec_ops.reduction_latency : 1;
+ unsigned int temp;
+
+ /* Sanity check, this should never happen. */
+ if ((vec_ops.stores + vec_ops.loads + vec_ops.general_ops) == 0)
+ return 1;
+
+ /* Check stores. */
+ if (vec_ops.stores > 0)
+ {
+ temp = CEIL (factor * vec_issue->stores_per_cycle,
+ vec_ops.stores);
+ unroll_factor = MIN (unroll_factor, temp);
+ }
+
+ /* Check loads + stores. */
+ if (vec_ops.loads > 0)
+ {
+ temp = CEIL (factor * vec_issue->loads_stores_per_cycle,
+ vec_ops.loads + vec_ops.stores);
+ unroll_factor = MIN (unroll_factor, temp);
+ }
+
+ /* Check general ops. */
+ if (vec_ops.general_ops > 0)
+ {
+ temp = CEIL (factor * vec_issue->general_ops_per_cycle,
+ vec_ops.general_ops);
+ unroll_factor = MIN (unroll_factor, temp);
+ }
+ max_unroll_factor = MAX (max_unroll_factor, unroll_factor);
+ }
+
+ /* Make sure unroll factor is power of 2. */
+ return 1 << ceil_log2 (max_unroll_factor);
+}
+
/* BODY_COST is the cost of a vector loop body. Adjust the cost as necessary
and return the new cost. */
unsigned int
@@ -16861,8 +16943,11 @@ aarch64_vector_costs::finish_cost (const vector_costs *uncast_scalar_costs)
if (loop_vinfo
&& m_vec_flags
&& aarch64_use_new_vector_costs_p ())
- m_costs[vect_body] = adjust_body_cost (loop_vinfo, scalar_costs,
- m_costs[vect_body]);
+ {
+ m_costs[vect_body] = adjust_body_cost (loop_vinfo, scalar_costs,
+ m_costs[vect_body]);
+ m_suggested_unroll_factor = determine_suggested_unroll_factor ();
+ }
/* Apply the heuristic described above m_stp_sequence_cost. Prefer
the scalar code in the event of a tie, since there is more chance
diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt
index 98ce9c0ab61709b29bd29f3853f025e3a5a1bef2..92220b26ee2bf9f95c9a387c3155779596ee5ad5 100644
--- a/gcc/config/aarch64/aarch64.opt
+++ b/gcc/config/aarch64/aarch64.opt
@@ -292,3 +292,7 @@ Constant memmove size in bytes above which to start using MOPS sequence.
-param=aarch64-mops-memset-size-threshold=
Target Joined UInteger Var(aarch64_mops_memset_size_threshold) Init(256) Param
Constant memset size in bytes from which to start using MOPS sequence.
+
+-param=aarch64-vect-unroll-limit=
+Target Joined UInteger Var(aarch64_vect_unroll_limit) Init(4) Param
+Limit how much the autovectorizer may unroll a loop.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index fb3dec4ab0c37810d4409fa6da6a82f0154b0a3a..8c27d55d1edf8bc0f4df6fed7e1df3523bc9686c 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -15174,6 +15174,12 @@ If this parameter is set to @var{n}, GCC will not use this heuristic
for loops that are known to execute in fewer than @var{n} Advanced
SIMD iterations.
+@item aarch64-vect-unroll-limit
+The vectorizer will use available tuning information to determine whether it
+would be beneficial to unroll the main vectorized loop and by how much. This
+parameter set's the upper bound of how much the vectorizer will unroll the main
+loop. The default value is four.
+
@end table
@end table
prev parent reply other threads:[~2022-03-31 16:15 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-03-16 14:59 Andre Vieira (lists)
2022-03-16 18:01 ` Richard Sandiford
2022-03-25 17:11 ` Andre Vieira (lists)
2022-03-28 14:59 ` Richard Sandiford
2022-03-31 16:15 ` Andre Vieira (lists) [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7135cc5f-fb71-c648-dc58-bf56b0618e0d@arm.com \
--to=andre.simoesdiasvieira@arm.com \
--cc=Kyrylo.Tkachov@arm.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=richard.sandiford@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).