* [patch] jump threading multiple paths that start from the same BB
@ 2017-11-07 17:38 Aldy Hernandez
2017-11-27 23:31 ` Jeff Law
2018-06-29 19:04 ` Jeff Law
0 siblings, 2 replies; 17+ messages in thread
From: Aldy Hernandez @ 2017-11-07 17:38 UTC (permalink / raw)
To: Jeff Law; +Cc: gcc-patches
[-- Attachment #1: Type: text/plain, Size: 3267 bytes --]
[One more time, but without rejected HTML mail, because apparently this
is my first post to gcc-patches *ever* ;-)].
Howdy!
While poking around in the backwards threader I noticed that we bail if
we have already seen a starting BB.
/* Do not jump-thread twice from the same block. */
if (bitmap_bit_p (threaded_blocks, entry->src->index)
This limitation discards paths that are sub-paths of paths that have
already been threaded.
The following patch scans the remaining to-be-threaded paths to identify
if any of them start from the same point, and are thus sub-paths of the
just-threaded path. By removing the common prefix of blocks in upcoming
threadable paths, and then rewiring first non-common block
appropriately, we expose new threading opportunities, since we are no
longer starting from the same BB. We also simplify the would-be
threaded paths, because we don't duplicate already duplicated paths.
This sounds confusing, but the documentation for the entry point to my
patch (adjust_paths_after_duplication) shows an actual example:
+/* After an FSM path has been jump threaded, adjust the remaining FSM
+ paths that are subsets of this path, so these paths can be safely
+ threaded within the context of the new threaded path.
+
+ For example, suppose we have just threaded:
+
+ 5 -> 6 -> 7 -> 8 -> 12 => 5 -> 6' -> 7' -> 8' -> 12'
+
+ And we have an upcoming threading candidate:
+ 5 -> 6 -> 7 -> 8 -> 15 -> 20
+
+ This function adjusts the upcoming path into:
+ 8' -> 15 -> 20
Notice that we will no longer have two paths that start from the same
BB. One will start with bb5, while the adjusted path will start with
bb8'. With this we kill two birds-- we are able to thread more paths,
and these paths will avoid duplicating a whole mess of things that have
already been threaded.
The attached patch is a subset of some upcoming work that can live on
its own. It bootstraps and regtests. Also, by running it on a handful
of .ii files, I can see that we are able to thread sub-paths that we
previously dropped on the floor. More is better, right? :)
To test this, I stole Jeff's method of using cachegrind to benchmark
instruction counts and conditional branches
(https://gcc.gnu.org/ml/gcc-patches/2013-11/msg02434.html).
Basically, I bootstrapped two compilers, with and without improvements,
and used each to build a stage1 trunk. Each of these stage1-trunks were
used on 246 .ii GCC files I have lying around from a bootstrap some
random point last year. I used no special flags on builds apart from
--enable-languages=c,c++.
Although I would've wished a larger improvement, this works comes for
free, as it's just a subset of other work I'm hacking on.
Without further ado, here are my monumental, earth shattering improvements:
Conditional branches
Without patch: 411846839709
With patch: 411831330316
%changed: -0.0037660%
Number of instructions
Without patch: 2271844650634
With patch: 2271761153578
%changed: -0.0036754%
OK for trunk?
Aldy
p.s. There is a lot of debugging/dumping code in my patch, which I can
gladly remove if/when approved. It helps keep my head straight while
looking at this spaghetti :).
[-- Attachment #2: curr.patch --]
[-- Type: text/x-patch, Size: 8805 bytes --]
gcc/
* tree-ssa-threadupdate.c (mark_threaded_blocks): Avoid
dereferencing path[] beyond its length.
(debug_path): New.
(debug_all_paths): New.
(rewire_first_differing_edge): New.
(adjust_paths_after_duplication): New.
(duplicate_thread_path): Call adjust_paths_after_duplication.
Add new argument.
(thread_through_all_blocks): Add new argument to
duplicate_thread_path.
diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c
index 1dab0f1fab4..53ac7181b4b 100644
--- a/gcc/tree-ssa-threadupdate.c
+++ b/gcc/tree-ssa-threadupdate.c
@@ -1763,7 +1763,8 @@ mark_threaded_blocks (bitmap threaded_blocks)
{
vec<jump_thread_edge *> *path = paths[i];
- if ((*path)[1]->type != EDGE_COPY_SRC_JOINER_BLOCK)
+ if (path->length () > 1
+ && (*path)[1]->type != EDGE_COPY_SRC_JOINER_BLOCK)
{
edge e = (*path)[0]->e;
e->aux = (void *)path;
@@ -1783,7 +1784,8 @@ mark_threaded_blocks (bitmap threaded_blocks)
{
vec<jump_thread_edge *> *path = paths[i];
- if ((*path)[1]->type == EDGE_COPY_SRC_JOINER_BLOCK)
+ if (path->length () > 1
+ && (*path)[1]->type == EDGE_COPY_SRC_JOINER_BLOCK)
{
/* Attach the path to the starting edge if none is yet recorded. */
if ((*path)[0]->e->aux == NULL)
@@ -1812,7 +1814,8 @@ mark_threaded_blocks (bitmap threaded_blocks)
vec<jump_thread_edge *> *path = paths[i];
edge e = (*path)[0]->e;
- if ((*path)[1]->type == EDGE_COPY_SRC_JOINER_BLOCK && e->aux == path)
+ if (path->length () > 1
+ && (*path)[1]->type == EDGE_COPY_SRC_JOINER_BLOCK && e->aux == path)
{
unsigned int j;
for (j = 1; j < path->length (); j++)
@@ -1978,6 +1981,169 @@ bb_in_bbs (basic_block bb, basic_block *bbs, int n)
return false;
}
+DEBUG_FUNCTION void
+debug_path (FILE *dump_file, int pathno)
+{
+ vec<jump_thread_edge *> *p = paths[pathno];
+ fprintf (dump_file, "path: ");
+ for (unsigned i = 0; i < p->length (); ++i)
+ fprintf (dump_file, "%d -> %d, ",
+ (*p)[i]->e->src->index, (*p)[i]->e->dest->index);
+ fprintf (dump_file, "\n");
+}
+
+DEBUG_FUNCTION void
+debug_all_paths ()
+{
+ for (unsigned i = 0; i < paths.length (); ++i)
+ debug_path (stderr, i);
+}
+
+/* Rewire a jump_thread_edge so that the source block is now a
+ threaded source block.
+
+ PATH_NUM is an index into the global path table PATHS.
+ EDGE_NUM is the jump thread edge number into said path.
+
+ Returns TRUE if we were able to successfully rewire the edge. */
+
+static bool
+rewire_first_differing_edge (unsigned path_num, unsigned edge_num)
+{
+ vec<jump_thread_edge *> *path = paths[path_num];
+ edge &e = (*path)[edge_num]->e;
+ if (dump_file && (dump_flags & TDF_DETAILS))
+ fprintf (dump_file, "rewiring edge candidate: %d -> %d\n",
+ e->src->index, e->dest->index);
+ basic_block src_copy = get_bb_copy (e->src);
+ if (src_copy == NULL)
+ {
+ if (dump_file && (dump_flags & TDF_DETAILS))
+ fprintf (dump_file, "ignoring candidate: there is no src COPY\n");
+ return false;
+ }
+ edge new_edge = find_edge (src_copy, e->dest);
+ /* If the previously threaded paths created a flow graph where we
+ can no longer figure out where to go, give up. */
+ if (new_edge == NULL)
+ {
+ if (dump_file && (dump_flags & TDF_DETAILS))
+ fprintf (dump_file, "ignoring candidate: we lost our way\n");
+ return false;
+ }
+ e = new_edge;
+ return true;
+}
+
+/* After an FSM path has been jump threaded, adjust the remaining FSM
+ paths that are subsets of this path, so these paths can be safely
+ threaded within the context of the new threaded path.
+
+ For example, suppose we have just threaded:
+
+ 5 -> 6 -> 7 -> 8 -> 12 => 5 -> 6' -> 7' -> 8' -> 12'
+
+ And we have an upcoming threading candidate:
+ 5 -> 6 -> 7 -> 8 -> 15 -> 20
+
+ This function adjusts the upcoming path into:
+ 8' -> 15 -> 20
+
+ CURR_PATH_NUM is an index into the global paths table. It
+ specifies the path that was just threaded. */
+
+static void
+adjust_paths_after_duplication (unsigned curr_path_num)
+{
+ vec<jump_thread_edge *> *curr_path = paths[curr_path_num];
+ gcc_assert ((*curr_path)[0]->type == EDGE_FSM_THREAD);
+
+ if (dump_file && (dump_flags & TDF_DETAILS))
+ {
+ fprintf (dump_file, "just threaded: ");
+ debug_path (dump_file, curr_path_num);
+ }
+
+ /* Iterate through all the other paths and adjust them. */
+ for (unsigned cand_path_num = 0; cand_path_num < paths.length (); )
+ {
+ if (cand_path_num == curr_path_num)
+ {
+ ++cand_path_num;
+ continue;
+ }
+ /* Make sure the candidate to adjust starts with the same path
+ as the recently threaded path and is an FSM thread. */
+ vec<jump_thread_edge *> *cand_path = paths[cand_path_num];
+ if ((*cand_path)[0]->type != EDGE_FSM_THREAD
+ || (*cand_path)[0]->e != (*curr_path)[0]->e)
+ {
+ ++cand_path_num;
+ continue;
+ }
+ if (dump_file && (dump_flags & TDF_DETAILS))
+ {
+ fprintf (dump_file, "adjusting candidate: ");
+ debug_path (dump_file, cand_path_num);
+ }
+
+ /* Chop off from the candidate path any prefix it shares with
+ the recently threaded path. */
+ unsigned minlength = MIN (curr_path->length (), cand_path->length ());
+ unsigned j;
+ for (j = 0; j < minlength; ++j)
+ {
+ edge cand_edge = (*cand_path)[j]->e;
+ edge curr_edge = (*curr_path)[j]->e;
+
+ /* Once the prefix no longer matches, adjust the first
+ non-matching edge to point from an adjusted edge to
+ wherever it was going. */
+ if (cand_edge != curr_edge)
+ {
+ gcc_assert (cand_edge->src == curr_edge->src);
+ if (!rewire_first_differing_edge (cand_path_num, j))
+ goto remove_candidate_from_list;
+ break;
+ }
+ }
+ if (j == minlength)
+ {
+ /* If we consumed the max subgraph we could look at, and
+ still didn't find any different edges, it's the
+ last edge after MINLENGTH. */
+ if (cand_path->length () > minlength)
+ {
+ if (!rewire_first_differing_edge (cand_path_num, j))
+ goto remove_candidate_from_list;
+ }
+ else if (dump_file && (dump_flags & TDF_DETAILS))
+ fprintf (dump_file, "adjusting first edge after MINLENGTH.\n");
+ }
+ if (j > 0)
+ {
+ /* If we are removing everything, delete the entire candidate. */
+ if (j == cand_path->length ())
+ {
+ remove_candidate_from_list:
+ if (dump_file && (dump_flags & TDF_DETAILS))
+ fprintf (dump_file, "adjusted candidate: [EMPTY]\n");
+ delete_jump_thread_path (cand_path);
+ paths.unordered_remove (cand_path_num);
+ continue;
+ }
+ /* Otherwise, just remove the redundant sub-path. */
+ cand_path->block_remove (0, j);
+ }
+ if (dump_file && (dump_flags & TDF_DETAILS))
+ {
+ fprintf (dump_file, "adjusted candidate: ");
+ debug_path (dump_file, cand_path_num);
+ }
+ ++cand_path_num;
+ }
+}
+
/* Duplicates a jump-thread path of N_REGION basic blocks.
The ENTRY edge is redirected to the duplicate of the region.
@@ -1985,11 +2151,14 @@ bb_in_bbs (basic_block bb, basic_block *bbs, int n)
and create a single fallthru edge pointing to the same destination as the
EXIT edge.
+ CURRENT_PATH_NO is an index into the global paths[] table
+ specifying the jump-thread path.
+
Returns false if it is unable to copy the region, true otherwise. */
static bool
duplicate_thread_path (edge entry, edge exit, basic_block *region,
- unsigned n_region)
+ unsigned n_region, unsigned current_path_no)
{
unsigned i;
struct loop *loop = entry->dest->loop_father;
@@ -2000,6 +2169,12 @@ duplicate_thread_path (edge entry, edge exit, basic_block *region,
if (!can_copy_bbs_p (region, n_region))
return false;
+ if (dump_file && (dump_flags & TDF_DETAILS))
+ {
+ fprintf (dump_file, "\nabout to thread: ");
+ debug_path (dump_file, current_path_no);
+ }
+
/* Some sanity checking. Note that we do not check for all possible
missuses of the functions. I.e. if you ask to copy something weird,
it will work, but the state of structures probably will not be
@@ -2128,6 +2303,8 @@ duplicate_thread_path (edge entry, edge exit, basic_block *region,
free (region_copy);
+ adjust_paths_after_duplication (current_path_no);
+
free_original_copy_tables ();
return true;
}
@@ -2251,7 +2428,7 @@ thread_through_all_blocks (bool may_peel_loop_headers)
for (unsigned int j = 0; j < len - 1; j++)
region[j] = (*path)[j]->e->dest;
- if (duplicate_thread_path (entry, exit, region, len - 1))
+ if (duplicate_thread_path (entry, exit, region, len - 1, i))
{
/* We do not update dominance info. */
free_dominance_info (CDI_DOMINATORS);
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [patch] jump threading multiple paths that start from the same BB
2017-11-07 17:38 [patch] jump threading multiple paths that start from the same BB Aldy Hernandez
@ 2017-11-27 23:31 ` Jeff Law
2017-11-29 20:24 ` Aldy Hernandez
2018-06-29 19:04 ` Jeff Law
1 sibling, 1 reply; 17+ messages in thread
From: Jeff Law @ 2017-11-27 23:31 UTC (permalink / raw)
To: Aldy Hernandez; +Cc: gcc-patches
On 11/07/2017 10:33 AM, Aldy Hernandez wrote:
> [One more time, but without rejected HTML mail, because apparently this
> is my first post to gcc-patches *ever* ;-)].
>
> Howdy!
>
> While poking around in the backwards threader I noticed that we bail if
> we have already seen a starting BB.
>
>      /* Do not jump-thread twice from the same block. */
> Â Â Â Â Â if (bitmap_bit_p (threaded_blocks, entry->src->index)
>
> This limitation discards paths that are sub-paths of paths that have
> already been threaded.
>
> The following patch scans the remaining to-be-threaded paths to identify
> if any of them start from the same point, and are thus sub-paths of the
> just-threaded path. By removing the common prefix of blocks in upcoming
> threadable paths, and then rewiring first non-common block
> appropriately, we expose new threading opportunities, since we are no
> longer starting from the same BB. We also simplify the would-be
> threaded paths, because we don't duplicate already duplicated paths.
>
> This sounds confusing, but the documentation for the entry point to my
> patch (adjust_paths_after_duplication) shows an actual example:
>
> +/* After an FSM path has been jump threaded, adjust the remaining FSM
> +Â Â paths that are subsets of this path, so these paths can be safely
> +Â Â threaded within the context of the new threaded path.
> +
> +Â Â For example, suppose we have just threaded:
> +
> +Â Â 5 -> 6 -> 7 -> 8 -> 12Â Â Â Â Â =>Â Â Â Â Â 5 -> 6' -> 7' -> 8' -> 12'
> +
> +Â Â And we have an upcoming threading candidate:
> +Â Â 5 -> 6 -> 7 -> 8 -> 15 -> 20
> +
> +Â Â This function adjusts the upcoming path into:
> +Â Â 8' -> 15 -> 20
So a nit on the comment. The destination of the target edge in a jump
threading path is not duplicated. So for the comment to be correct, I
think you need to change 12' to just 12.
Presumably BB8 ends in a conditional which we can't determine anything
about statically (otherwise we wouldn't have 8->12 and 8->15). It's
something that may confuse someone not familiar with this code. Though
full CFGs may be too painful to show here as well.
>
> Notice that we will no longer have two paths that start from the same
> BB. One will start with bb5, while the adjusted path will start with
> bb8'. With this we kill two birds-- we are able to thread more paths,
> and these paths will avoid duplicating a whole mess of things that have
> already been threaded.
Which is a good thing IMHO. Essentially with the code right now we have
to wait until the next backwards threading pass to pick up the second
jump thread. It's not really a secondary effect -- we found the
thread, but didn't have the support we needed in the copier to DTRT.
>
> The attached patch is a subset of some upcoming work that can live on
> its own. It bootstraps and regtests. Also, by running it on a handful
> of .ii files, I can see that we are able to thread sub-paths that we
> previously dropped on the floor. More is better, right? :)
Generally, yes. Similarly, the more thoroughly we can thread earlier,
the better. While my immediate goal is to remove tree-vrp's jump
threading, dropping one or two (of the four!) backwards passes is also
on the hitlist.
>
> To test this, I stole Jeff's method of using cachegrind to benchmark
> instruction counts and conditional branches
> (https://gcc.gnu.org/ml/gcc-patches/2013-11/msg02434.html).
:-)
>
> Basically, I bootstrapped two compilers, with and without improvements,
> and used each to build a stage1 trunk. Each of these stage1-trunks were
> used on 246 .ii GCC files I have lying around from a bootstrap some
> random point last year. I used no special flags on builds apart from
> --enable-languages=c,c++.
>
> Although I would've wished a larger improvement, this works comes for
> free, as it's just a subset of other work I'm hacking on.
Yup.
>
> Without further ado, here are my monumental, earth shattering improvements:
>
> Conditional branches
> Â Â Without patch: 411846839709
>   With   patch: 411831330316
> Â Â Â Â Â Â Â %changed: -0.0037660%
>
> Number of instructions
> Â Â Without patch: 2271844650634
>   With   patch: 2271761153578
> Â Â Â Â Â Â Â %changed: -0.0036754%
So that's pretty small. A really good improvement would be on the order
of a half-percent reduction in runtime conditional branches. I usually
test with .i files that have enable-checking turned on -- which tends to
give lots of opportunities due to the redundancies in our checking code.
On a positive note, you're eliminating roughly 4.5 other dynamic
instructions for every runtime conditional branch you remove, which is
actually a good ratio. 3.5 is what I typically see for a fairly
extensive amount of work. Patrick P's work last year was on the order
of 7.5. So while it didn't fire often, when it did it was highly effective.
We have installed changes of this nature when they were part of a larger
set of changes, particularly if they helped us elsewhere (like
eliminating key path to silence a bogus warning or addressing other
regressions).
I'm going to defer judgment right now. I will do some further testing
as I walk through the various uninitialized and similar warnings as well
as with any work to see if we can reasonably eliminate one or two of the
backwards passes. If it helps in either of those contexts then we have
a much stronger case to include now rather than defer until the larger
work lands.
> p.s. There is a lot of debugging/dumping code in my patch, which I can
> gladly remove if/when approved. It helps keep my head straight while
> looking at this spaghetti :).
Yea :-) Actually keeping some of it is probably useful. My brain turns
to mush as I move between the old style and new style jump threading
implementations. Better dumps would probably help.
SO the changes to test that path->length > 1 in mark_threaded_blocks
seems like it ought to go forward independently.
Conceptually it looks sane. A testcase or two would help.
Jeff
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [patch] jump threading multiple paths that start from the same BB
2017-11-27 23:31 ` Jeff Law
@ 2017-11-29 20:24 ` Aldy Hernandez
2017-11-30 23:38 ` Jeff Law
0 siblings, 1 reply; 17+ messages in thread
From: Aldy Hernandez @ 2017-11-29 20:24 UTC (permalink / raw)
To: Jeff Law; +Cc: gcc-patches
On 11/27/2017 06:27 PM, Jeff Law wrote:
> On 11/07/2017 10:33 AM, Aldy Hernandez wrote:
>> Without further ado, here are my monumental, earth shattering improvements:
>>
>> Conditional branches
>> Â Â Without patch: 411846839709
>>   With   patch: 411831330316
>> Â Â Â Â Â Â Â %changed: -0.0037660%
>>
>> Number of instructions
>> Â Â Without patch: 2271844650634
>>   With   patch: 2271761153578
>> Â Â Â Â Â Â Â %changed: -0.0036754%
> So that's pretty small. A really good improvement would be on the order
> of a half-percent reduction in runtime conditional branches. I usually
> test with .i files that have enable-checking turned on -- which tends to
> give lots of opportunities due to the redundancies in our checking code.
>
> On a positive note, you're eliminating roughly 4.5 other dynamic
> instructions for every runtime conditional branch you remove, which is
> actually a good ratio. 3.5 is what I typically see for a fairly
> extensive amount of work. Patrick P's work last year was on the order
> of 7.5. So while it didn't fire often, when it did it was highly effective.
I've retested with .ii files gathered from a stage1 build with
--enable-checking=yes,all. The results are an order of magnitude better
but not impressive by any means:
Conditional branches
Without patch: 1333333689781
With patch: 1333666327560
%changed: -0.0249416%
Number of instructions
Without patch: 7347240547621
With patch: 7348622241739
%changed: -0.0188021%
>
> We have installed changes of this nature when they were part of a larger
> set of changes, particularly if they helped us elsewhere (like
> eliminating key path to silence a bogus warning or addressing other
> regressions).
I don't know if the above changes your view, but I am not losing sleep
over this, so no worries. I will just keep accumulating these along
with the myriad of other changes I have ;-).
Thanks for taking a look at this, and making sure I'm headed in the
right direction.
Aldy
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [patch] jump threading multiple paths that start from the same BB
2017-11-29 20:24 ` Aldy Hernandez
@ 2017-11-30 23:38 ` Jeff Law
0 siblings, 0 replies; 17+ messages in thread
From: Jeff Law @ 2017-11-30 23:38 UTC (permalink / raw)
To: Aldy Hernandez; +Cc: gcc-patches
On 11/29/2017 11:49 AM, Aldy Hernandez wrote:
> On 11/27/2017 06:27 PM, Jeff Law wrote:
>> On 11/07/2017 10:33 AM, Aldy Hernandez wrote:
>
>>> Without further ado, here are my monumental, earth shattering
>>> improvements:
>>>
>>> Conditional branches
>>> Â Â Â Without patch: 411846839709
>>>    With   patch: 411831330316
>>> Â Â Â Â Â Â Â Â %changed: -0.0037660%
>>>
>>> Number of instructions
>>> Â Â Â Without patch: 2271844650634
>>>    With   patch: 2271761153578
>>> Â Â Â Â Â Â Â Â %changed: -0.0036754%
>> So that's pretty small. A really good improvement would be on the order
>> of a half-percent reduction in runtime conditional branches. I usually
>> test with .i files that have enable-checking turned on -- which tends to
>> give lots of opportunities due to the redundancies in our checking code.
>>
>> On a positive note, you're eliminating roughly 4.5 other dynamic
>> instructions for every runtime conditional branch you remove, which is
>> actually a good ratio. 3.5 is what I typically see for a fairly
>> extensive amount of work.  Patrick P's work last year was on the order
>> of 7.5. So while it didn't fire often, when it did it was highly
>> effective.
>
> I've retested with .ii files gathered from a stage1 build with
> --enable-checking=yes,all. The results are an order of magnitude better
> but not impressive by any means:
Ack. It'd need another order of magnitude to start getting into that
interesting range by itself :-) You're still hitting that 4.5 ratio of
non-branch to branch instructions eliminated. So it's higher value when
it applies than most of the stuff I've done through the years.
>> We have installed changes of this nature when they were part of a larger
>> set of changes, particularly if they helped us elsewhere (like
>> eliminating key path to silence a bogus warning or addressing other
>> regressions).
>
> I don't know if the above changes your view, but I am not losing sleep
> over this, so no worries. I will just keep accumulating these along
> with the myriad of other changes I have ;-).
I'm not sure it changes anything either. But I'll certainly keep it in
mind as I start digging deeper into the buglists for gcc-8. If it helps
in the bugfixing effort, then we'll have an interesting decision to make.
jeff
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [patch] jump threading multiple paths that start from the same BB
2017-11-07 17:38 [patch] jump threading multiple paths that start from the same BB Aldy Hernandez
2017-11-27 23:31 ` Jeff Law
@ 2018-06-29 19:04 ` Jeff Law
2018-07-01 10:56 ` Aldy Hernandez
1 sibling, 1 reply; 17+ messages in thread
From: Jeff Law @ 2018-06-29 19:04 UTC (permalink / raw)
To: Aldy Hernandez; +Cc: gcc-patches
[ Returning to another old patch... ]
On 11/07/2017 10:33 AM, Aldy Hernandez wrote:
> [One more time, but without rejected HTML mail, because apparently this
> is my first post to gcc-patches *ever* ;-)].
>
> Howdy!
>
> While poking around in the backwards threader I noticed that we bail if
> we have already seen a starting BB.
>
>       /* Do not jump-thread twice from the same block.  */
>       if (bitmap_bit_p (threaded_blocks, entry->src->index)
>
> This limitation discards paths that are sub-paths of paths that have
> already been threaded.
>
> The following patch scans the remaining to-be-threaded paths to identify
> if any of them start from the same point, and are thus sub-paths of the
> just-threaded path. By removing the common prefix of blocks in upcoming
> threadable paths, and then rewiring first non-common block
> appropriately, we expose new threading opportunities, since we are no
> longer starting from the same BB. We also simplify the would-be
> threaded paths, because we don't duplicate already duplicated paths.
>
> This sounds confusing, but the documentation for the entry point to my
> patch (adjust_paths_after_duplication) shows an actual example:
>
> +/* After an FSM path has been jump threaded, adjust the remaining FSM
> +   paths that are subsets of this path, so these paths can be safely
> +   threaded within the context of the new threaded path.
> +
> +   For example, suppose we have just threaded:
> +
> +Â Â Â 5Â ->Â 6Â ->Â 7Â ->Â 8Â ->Â 12Â Â Â Â Â Â =>Â Â Â Â Â Â 5Â ->Â 6'Â ->Â 7'Â ->Â 8'Â ->Â 12'
> +
> +   And we have an upcoming threading candidate:
> +Â Â Â 5Â ->Â 6Â ->Â 7Â ->Â 8Â ->Â 15Â ->Â 20
> +
> +   This function adjusts the upcoming path into:
> +Â Â Â 8'Â ->Â 15Â ->Â 20
>
> Notice that we will no longer have two paths that start from the same
> BB. One will start with bb5, while the adjusted path will start with
> bb8'. With this we kill two birds-- we are able to thread more paths,
> and these paths will avoid duplicating a whole mess of things that have
> already been threaded.
>
> The attached patch is a subset of some upcoming work that can live on
> its own. It bootstraps and regtests. Also, by running it on a handful
> of .ii files, I can see that we are able to thread sub-paths that we
> previously dropped on the floor.  More is better, right? :)
>
> To test this, I stole Jeff's method of using cachegrind to benchmark
> instruction counts and conditional branches
> (https://gcc.gnu.org/ml/gcc-patches/2013-11/msg02434.html).
>
> Basically, I bootstrapped two compilers, with and without improvements,
> and used each to build a stage1 trunk. Each of these stage1-trunks were
> used on 246 .ii GCC files I have lying around from a bootstrap some
> random point last year. I used no special flags on builds apart from
> --enable-languages=c,c++.
>
> Although I would've wished a larger improvement, this works comes for
> free, as it's just a subset of other work I'm hacking on.
>
> Without further ado, here are my monumental, earth shattering improvements:
>
> Conditional branches
>    Without patch: 411846839709
>    With    patch: 411831330316
> Â Â Â Â Â Â Â Â %changed:Â -0.0037660%
>
> Number of instructions
>    Without patch: 2271844650634
>    With    patch: 2271761153578
> Â Â Â Â Â Â Â Â %changed:Â -0.0036754%
>
>
> OK for trunk?
> Aldy
>
> p.s. There is a lot of debugging/dumping code in my patch, which I can
> gladly remove if/when approved. It helps keep my head straight while
> looking at this spaghetti :).
>
> curr.patch
>
>
> gcc/
>
> * tree-ssa-threadupdate.c (mark_threaded_blocks): Avoid
> dereferencing path[] beyond its length.
> (debug_path): New.
> (debug_all_paths): New.
> (rewire_first_differing_edge): New.
> (adjust_paths_after_duplication): New.
> (duplicate_thread_path): Call adjust_paths_after_duplication.
> Add new argument.
> (thread_through_all_blocks): Add new argument to
> duplicate_thread_path.
This is fine for the trunk. I'd keep the dumping code as-is. It'll be
useful in the future :-)
>
> diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c
> index 1dab0f1fab4..53ac7181b4b 100644
> --- a/gcc/tree-ssa-threadupdate.c
> +++ b/gcc/tree-ssa-threadupdate.c
> +
> +/* Rewire a jump_thread_edge so that the source block is now a
> + threaded source block.
> +
> + PATH_NUM is an index into the global path table PATHS.
> + EDGE_NUM is the jump thread edge number into said path.
> +
> + Returns TRUE if we were able to successfully rewire the edge. */
> +
> +static bool
> +rewire_first_differing_edge (unsigned path_num, unsigned edge_num)
> +{
> + vec<jump_thread_edge *> *path = paths[path_num];
> + edge &e = (*path)[edge_num]->e;
> + if (dump_file && (dump_flags & TDF_DETAILS))
> + fprintf (dump_file, "rewiring edge candidate: %d -> %d\n",
> + e->src->index, e->dest->index);
> + basic_block src_copy = get_bb_copy (e->src);
> + if (src_copy == NULL)
> + {
> + if (dump_file && (dump_flags & TDF_DETAILS))
> + fprintf (dump_file, "ignoring candidate: there is no src COPY\n");
> + return false;
> + }
> + edge new_edge = find_edge (src_copy, e->dest);
> + /* If the previously threaded paths created a flow graph where we
> + can no longer figure out where to go, give up. */
> + if (new_edge == NULL)
> + {
> + if (dump_file && (dump_flags & TDF_DETAILS))
> + fprintf (dump_file, "ignoring candidate: we lost our way\n");
> + return false;
> + }
> + e = new_edge;
> + return true;
I was at first a bit confused about how this function had any visible
side effects. Then I realized that "e" is actually a reference to an
edge in the jump threading path and that seemingly dead assignment at
the end isn't really dead :-)
Jeff
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [patch] jump threading multiple paths that start from the same BB
2018-06-29 19:04 ` Jeff Law
@ 2018-07-01 10:56 ` Aldy Hernandez
2018-07-02 11:08 ` Christophe Lyon
0 siblings, 1 reply; 17+ messages in thread
From: Aldy Hernandez @ 2018-07-01 10:56 UTC (permalink / raw)
To: Jeff Law; +Cc: gcc-patches
On 06/29/2018 02:50 PM, Jeff Law wrote:
> [ Returning to another old patch... ]
>
> On 11/07/2017 10:33 AM, Aldy Hernandez wrote:
>> [One more time, but without rejected HTML mail, because apparently this
>> is my first post to gcc-patches *ever* ;-)].
>>
>> Howdy!
>>
>> While poking around in the backwards threader I noticed that we bail if
>> we have already seen a starting BB.
>>
>>       /* Do not jump-thread twice from the same block.  */
>>       if (bitmap_bit_p (threaded_blocks, entry->src->index)
>>
>> This limitation discards paths that are sub-paths of paths that have
>> already been threaded.
>>
>> The following patch scans the remaining to-be-threaded paths to identify
>> if any of them start from the same point, and are thus sub-paths of the
>> just-threaded path. By removing the common prefix of blocks in upcoming
>> threadable paths, and then rewiring first non-common block
>> appropriately, we expose new threading opportunities, since we are no
>> longer starting from the same BB. We also simplify the would-be
>> threaded paths, because we don't duplicate already duplicated paths.
>>
>> This sounds confusing, but the documentation for the entry point to my
>> patch (adjust_paths_after_duplication) shows an actual example:
>>
>> +/* After an FSM path has been jump threaded, adjust the remaining FSM
>> +   paths that are subsets of this path, so these paths can be safely
>> +   threaded within the context of the new threaded path.
>> +
>> +   For example, suppose we have just threaded:
>> +
>> +Â Â Â 5Â ->Â 6Â ->Â 7Â ->Â 8Â ->Â 12Â Â Â Â Â Â =>Â Â Â Â Â Â 5Â ->Â 6'Â ->Â 7'Â ->Â 8'Â ->Â 12'
>> +
>> +   And we have an upcoming threading candidate:
>> +Â Â Â 5Â ->Â 6Â ->Â 7Â ->Â 8Â ->Â 15Â ->Â 20
>> +
>> +   This function adjusts the upcoming path into:
>> +Â Â Â 8'Â ->Â 15Â ->Â 20
>>
>> Notice that we will no longer have two paths that start from the same
>> BB. One will start with bb5, while the adjusted path will start with
>> bb8'. With this we kill two birds-- we are able to thread more paths,
>> and these paths will avoid duplicating a whole mess of things that have
>> already been threaded.
>>
>> The attached patch is a subset of some upcoming work that can live on
>> its own. It bootstraps and regtests. Also, by running it on a handful
>> of .ii files, I can see that we are able to thread sub-paths that we
>> previously dropped on the floor.  More is better, right? :)
>>
>> To test this, I stole Jeff's method of using cachegrind to benchmark
>> instruction counts and conditional branches
>> (https://gcc.gnu.org/ml/gcc-patches/2013-11/msg02434.html).
>>
>> Basically, I bootstrapped two compilers, with and without improvements,
>> and used each to build a stage1 trunk. Each of these stage1-trunks were
>> used on 246 .ii GCC files I have lying around from a bootstrap some
>> random point last year. I used no special flags on builds apart from
>> --enable-languages=c,c++.
>>
>> Although I would've wished a larger improvement, this works comes for
>> free, as it's just a subset of other work I'm hacking on.
>>
>> Without further ado, here are my monumental, earth shattering improvements:
>>
>> Conditional branches
>>    Without patch: 411846839709
>>    With    patch: 411831330316
>> Â Â Â Â Â Â Â Â %changed:Â -0.0037660%
>>
>> Number of instructions
>>    Without patch: 2271844650634
>>    With    patch: 2271761153578
>> Â Â Â Â Â Â Â Â %changed:Â -0.0036754%
>>
>>
>> OK for trunk?
>> Aldy
>>
>> p.s. There is a lot of debugging/dumping code in my patch, which I can
>> gladly remove if/when approved. It helps keep my head straight while
>> looking at this spaghetti :).
>>
>> curr.patch
>>
>>
>> gcc/
>>
>> * tree-ssa-threadupdate.c (mark_threaded_blocks): Avoid
>> dereferencing path[] beyond its length.
>> (debug_path): New.
>> (debug_all_paths): New.
>> (rewire_first_differing_edge): New.
>> (adjust_paths_after_duplication): New.
>> (duplicate_thread_path): Call adjust_paths_after_duplication.
>> Add new argument.
>> (thread_through_all_blocks): Add new argument to
>> duplicate_thread_path.
> This is fine for the trunk. I'd keep the dumping code as-is. It'll be
> useful in the future :-)
Retested against current trunk and committed.
Thanks.
Aldy
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [patch] jump threading multiple paths that start from the same BB
2018-07-01 10:56 ` Aldy Hernandez
@ 2018-07-02 11:08 ` Christophe Lyon
2018-07-03 9:31 ` Aldy Hernandez
0 siblings, 1 reply; 17+ messages in thread
From: Christophe Lyon @ 2018-07-02 11:08 UTC (permalink / raw)
To: Aldy Hernandez; +Cc: Jeff Law, gcc Patches
On Sun, 1 Jul 2018 at 12:56, Aldy Hernandez <aldyh@redhat.com> wrote:
>
>
>
> On 06/29/2018 02:50 PM, Jeff Law wrote:
> > [ Returning to another old patch... ]
> >
> > On 11/07/2017 10:33 AM, Aldy Hernandez wrote:
> >> [One more time, but without rejected HTML mail, because apparently this
> >> is my first post to gcc-patches *ever* ;-)].
> >>
> >> Howdy!
> >>
> >> While poking around in the backwards threader I noticed that we bail if
> >> we have already seen a starting BB.
> >>
> >> /* Do not jump-thread twice from the same block. */
> >> if (bitmap_bit_p (threaded_blocks, entry->src->index)
> >>
> >> This limitation discards paths that are sub-paths of paths that have
> >> already been threaded.
> >>
> >> The following patch scans the remaining to-be-threaded paths to identify
> >> if any of them start from the same point, and are thus sub-paths of the
> >> just-threaded path. By removing the common prefix of blocks in upcoming
> >> threadable paths, and then rewiring first non-common block
> >> appropriately, we expose new threading opportunities, since we are no
> >> longer starting from the same BB. We also simplify the would-be
> >> threaded paths, because we don't duplicate already duplicated paths.
> >>
> >> This sounds confusing, but the documentation for the entry point to my
> >> patch (adjust_paths_after_duplication) shows an actual example:
> >>
> >> +/* After an FSM path has been jump threaded, adjust the remaining FSM
> >> + paths that are subsets of this path, so these paths can be safely
> >> + threaded within the context of the new threaded path.
> >> +
> >> + For example, suppose we have just threaded:
> >> +
> >> + 5 -> 6 -> 7 -> 8 -> 12 => 5 -> 6' -> 7' -> 8' -> 12'
> >> +
> >> + And we have an upcoming threading candidate:
> >> + 5 -> 6 -> 7 -> 8 -> 15 -> 20
> >> +
> >> + This function adjusts the upcoming path into:
> >> + 8' -> 15 -> 20
> >>
> >> Notice that we will no longer have two paths that start from the same
> >> BB. One will start with bb5, while the adjusted path will start with
> >> bb8'. With this we kill two birds-- we are able to thread more paths,
> >> and these paths will avoid duplicating a whole mess of things that have
> >> already been threaded.
> >>
> >> The attached patch is a subset of some upcoming work that can live on
> >> its own. It bootstraps and regtests. Also, by running it on a handful
> >> of .ii files, I can see that we are able to thread sub-paths that we
> >> previously dropped on the floor. More is better, right? :)
> >>
> >> To test this, I stole Jeff's method of using cachegrind to benchmark
> >> instruction counts and conditional branches
> >> (https://gcc.gnu.org/ml/gcc-patches/2013-11/msg02434.html).
> >>
> >> Basically, I bootstrapped two compilers, with and without improvements,
> >> and used each to build a stage1 trunk. Each of these stage1-trunks were
> >> used on 246 .ii GCC files I have lying around from a bootstrap some
> >> random point last year. I used no special flags on builds apart from
> >> --enable-languages=c,c++.
> >>
> >> Although I would've wished a larger improvement, this works comes for
> >> free, as it's just a subset of other work I'm hacking on.
> >>
> >> Without further ado, here are my monumental, earth shattering improvements:
> >>
> >> Conditional branches
> >> Without patch: 411846839709
> >> With patch: 411831330316
> >> %changed: -0.0037660%
> >>
> >> Number of instructions
> >> Without patch: 2271844650634
> >> With patch: 2271761153578
> >> %changed: -0.0036754%
> >>
> >>
> >> OK for trunk?
> >> Aldy
> >>
> >> p.s. There is a lot of debugging/dumping code in my patch, which I can
> >> gladly remove if/when approved. It helps keep my head straight while
> >> looking at this spaghetti :).
> >>
> >> curr.patch
> >>
> >>
> >> gcc/
> >>
> >> * tree-ssa-threadupdate.c (mark_threaded_blocks): Avoid
> >> dereferencing path[] beyond its length.
> >> (debug_path): New.
> >> (debug_all_paths): New.
> >> (rewire_first_differing_edge): New.
> >> (adjust_paths_after_duplication): New.
> >> (duplicate_thread_path): Call adjust_paths_after_duplication.
> >> Add new argument.
> >> (thread_through_all_blocks): Add new argument to
> >> duplicate_thread_path.
> > This is fine for the trunk. I'd keep the dumping code as-is. It'll be
> > useful in the future :-)
>
> Retested against current trunk and committed.
>
Hi,
I've noticed a regression on aarch64:
FAIL: gcc.dg/tree-ssa/ssa-dom-thread-7.c scan-tree-dump thread3 "Jumps
threaded: 3"
very likely caused by this patch (appeared between 262282 and 262294)
Christophe
> Thanks.
>
> Aldy
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [patch] jump threading multiple paths that start from the same BB
2018-07-02 11:08 ` Christophe Lyon
@ 2018-07-03 9:31 ` Aldy Hernandez
2018-07-04 0:16 ` Jeff Law
0 siblings, 1 reply; 17+ messages in thread
From: Aldy Hernandez @ 2018-07-03 9:31 UTC (permalink / raw)
To: Christophe Lyon; +Cc: Jeff Law, gcc Patches
[-- Attachment #1: Type: text/plain, Size: 3017 bytes --]
On 07/02/2018 07:08 AM, Christophe Lyon wrote:
>>> On 11/07/2017 10:33 AM, Aldy Hernandez wrote:
>>>> While poking around in the backwards threader I noticed that we bail if
>>>> we have already seen a starting BB.
>>>>
>>>> /* Do not jump-thread twice from the same block. */
>>>> if (bitmap_bit_p (threaded_blocks, entry->src->index)
>>>>
>>>> This limitation discards paths that are sub-paths of paths that have
>>>> already been threaded.
>>>>
>>>> The following patch scans the remaining to-be-threaded paths to identify
>>>> if any of them start from the same point, and are thus sub-paths of the
>>>> just-threaded path. By removing the common prefix of blocks in upcoming
>>>> threadable paths, and then rewiring first non-common block
>>>> appropriately, we expose new threading opportunities, since we are no
>>>> longer starting from the same BB. We also simplify the would-be
>>>> threaded paths, because we don't duplicate already duplicated paths.
[snip]
> Hi,
>
> I've noticed a regression on aarch64:
> FAIL: gcc.dg/tree-ssa/ssa-dom-thread-7.c scan-tree-dump thread3 "Jumps
> threaded: 3"
> very likely caused by this patch (appeared between 262282 and 262294)
>
> Christophe
The test needs to be adjusted here.
The long story is that the aarch64 IL is different at thread3 time in
that it has 2 profitable sub-paths that can now be threaded with my
patch. This is causing the threaded count to be 5 for aarch64, versus 3
for x86 64. Previously we couldn't thread these in aarch64, so the
backwards threader would bail.
One can see the different threading opportunities by sticking
debug_all_paths() at the top of thread_through_all_blocks(). You will
notice that aarch64 has far more candidates to begin with. The IL on
the x86 backend, has no paths that start on the same BB. The aarch64,
on the other hand, has many to choose from:
path: 52 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 11,
path: 51 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 16,
path: 53 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 16,
path: 52 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 11, 11 -> 35,
path: 51 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 11, 11 -> 35,
path: 53 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 11, 11 -> 35,
path: 52 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 16, 16 -> 17,
path: 51 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 16, 16 -> 17,
path: 53 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 16, 16 -> 19,
Some of these prove unprofitable, but 2 more than before are profitable now.
BTW, I see another threading related failure on aarch64 which is
unrelated to my patch, and was previously there:
FAIL: gcc.dg/tree-ssa/ssa-dom-thread-7.c scan-tree-dump-not vrp2 "Jumps
threaded"
This is probably another IL incompatibility between architectures.
Anyways... the attached path fixes the regression. I have added a note
to the test explaining the IL differences. We really should rewrite all
the threading tests (I am NOT volunteering ;-)).
OK for trunk?
Aldy
[-- Attachment #2: curr.patch --]
[-- Type: text/x-patch, Size: 1417 bytes --]
gcc/testsuite/
* gcc.dg/tree-ssa/ssa-dom-thread-7.c: Adjust test because aarch64
has a slightly different IL that provides more threading
opportunities.
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
index 9ee8d12010b..e395de26ec0 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
@@ -2,11 +2,16 @@
/* { dg-options "-O2 -fdump-tree-thread1-stats -fdump-tree-thread2-stats -fdump-tree-dom2-stats -fdump-tree-thread3-stats -fdump-tree-dom3-stats -fdump-tree-vrp2-stats -fno-guess-branch-probability" } */
/* { dg-final { scan-tree-dump "Jumps threaded: 16" "thread1" } } */
/* { dg-final { scan-tree-dump "Jumps threaded: 9" "thread2" } } */
-/* { dg-final { scan-tree-dump "Jumps threaded: 3" "thread3" } } */
/* { dg-final { scan-tree-dump "Jumps threaded: 1" "dom2" } } */
/* { dg-final { scan-tree-dump-not "Jumps threaded" "dom3" } } */
/* { dg-final { scan-tree-dump-not "Jumps threaded" "vrp2" } } */
+/* Most architectures get 3 threadable paths here, whereas aarch64 and
+ possibly others get 5. We really should rewrite threading tests to
+ test a specific IL sequence, not gobs of code whose IL can vary
+ from architecture to architecture. */
+/* { dg-final { scan-tree-dump "Jumps threaded: \[35\]" "thread3" } } */
+
enum STATE {
S0=0,
SI,
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [patch] jump threading multiple paths that start from the same BB
2018-07-03 9:31 ` Aldy Hernandez
@ 2018-07-04 0:16 ` Jeff Law
2018-07-04 8:12 ` Aldy Hernandez
0 siblings, 1 reply; 17+ messages in thread
From: Jeff Law @ 2018-07-04 0:16 UTC (permalink / raw)
To: Aldy Hernandez, Christophe Lyon; +Cc: gcc Patches
On 07/03/2018 03:31 AM, Aldy Hernandez wrote:
> On 07/02/2018 07:08 AM, Christophe Lyon wrote:
>
>>>> On 11/07/2017 10:33 AM, Aldy Hernandez wrote:
>>>>> While poking around in the backwards threader I noticed that we bail if
>>>>>
>>>>> we have already seen a starting BB.
>>>>>
>>>>>         /* Do not jump-thread twice from the same block.  */
>>>>>         if (bitmap_bit_p (threaded_blocks, entry->src->index)
>>>>>
>>>>> This limitation discards paths that are sub-paths of paths that have
>>>>> already been threaded.
>>>>>
>>>>> The following patch scans the remaining to-be-threaded paths to identify
>>>>>
>>>>> if any of them start from the same point, and are thus sub-paths of the
>>>>>
>>>>> just-threaded path.  By removing the common prefix of blocks in upcoming
>>>>>
>>>>> threadable paths, and then rewiring first non-common block
>>>>> appropriately, we expose new threading opportunities, since we are no
>>>>> longer starting from the same BB.  We also simplify the would-be
>>>>> threaded paths, because we don't duplicate already duplicated paths.
> [snip]
>> Hi,
>>
>> I've noticed a regression on aarch64:
>> FAIL: gcc.dg/tree-ssa/ssa-dom-thread-7.c scan-tree-dump thread3 "Jumps
>> threaded:Â 3"
>> very likely caused by this patch (appeared between 262282 and 262294)
>>
>> Christophe
>
> The test needs to be adjusted here.
>
> The long story is that the aarch64 IL is different at thread3 time in
> that it has 2 profitable sub-paths that can now be threaded with my
> patch. This is causing the threaded count to be 5 for aarch64, versus 3
> for x86 64. Previously we couldn't thread these in aarch64, so the
> backwards threader would bail.
>
> One can see the different threading opportunities by sticking
> debug_all_paths() at the top of thread_through_all_blocks(). You will
> notice that aarch64 has far more candidates to begin with. The IL on
> the x86 backend, has no paths that start on the same BB. The aarch64,
> on the other hand, has many to choose from:
>
> path: 52 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 11,
> path: 51 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 16,
> path: 53 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 16,
> path: 52 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 11, 11 -> 35,
> path: 51 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 11, 11 -> 35,
> path: 53 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 11, 11 -> 35,
> path: 52 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 16, 16 -> 17,
> path: 51 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 16, 16 -> 17,
> path: 53 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 16, 16 -> 19,
>
> Some of these prove unprofitable, but 2 more than before are profitable now.
>
>
> BTW, I see another threading related failure on aarch64 which is
> unrelated to my patch, and was previously there:
>
> FAIL: gcc.dg/tree-ssa/ssa-dom-thread-7.c scan-tree-dump-not vrp2 "Jumps
> threaded"
>
> This is probably another IL incompatibility between architectures.
>
> Anyways... the attached path fixes the regression. I have added a note
> to the test explaining the IL differences. We really should rewrite all
> the threading tests (I am NOT volunteering ;-)).
>
> OK for trunk?
> Aldy
>
> curr.patch
>
>
> gcc/testsuite/
>
> * gcc.dg/tree-ssa/ssa-dom-thread-7.c: Adjust test because aarch64
> has a slightly different IL that provides more threading
> opportunities.
OK.
WRT rewriting the tests. I'd certainly agree that we don't have the
right set of knobs to allow us to characterize the target nor do we have
the right dumping/scanning facilities to describe and query the CFG changes.
The fact that the IL changes so much across targets is a sign that
target dependency (probably BRANCH_COST) is twiddling the gimple we
generate. I strongly suspect we'd be a lot better off if we tackled the
BRANCH_COST problem first.
jeff
ps. That particular test is the test which led to the creation of the
backwards jump threader :-)
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [patch] jump threading multiple paths that start from the same BB
2018-07-04 0:16 ` Jeff Law
@ 2018-07-04 8:12 ` Aldy Hernandez
2018-07-04 9:03 ` Richard Biener
2018-07-05 19:27 ` Jeff Law
0 siblings, 2 replies; 17+ messages in thread
From: Aldy Hernandez @ 2018-07-04 8:12 UTC (permalink / raw)
To: Jeff Law, Christophe Lyon; +Cc: gcc Patches
On 07/03/2018 08:16 PM, Jeff Law wrote:
> On 07/03/2018 03:31 AM, Aldy Hernandez wrote:
>> On 07/02/2018 07:08 AM, Christophe Lyon wrote:
>>
>>>>> On 11/07/2017 10:33 AM, Aldy Hernandez wrote:
>>>>>> While poking around in the backwards threader I noticed that we bail if
>>>>>>
>>>>>> we have already seen a starting BB.
>>>>>>
>>>>>>         /* Do not jump-thread twice from the same block.  */
>>>>>>         if (bitmap_bit_p (threaded_blocks, entry->src->index)
>>>>>>
>>>>>> This limitation discards paths that are sub-paths of paths that have
>>>>>> already been threaded.
>>>>>>
>>>>>> The following patch scans the remaining to-be-threaded paths to identify
>>>>>>
>>>>>> if any of them start from the same point, and are thus sub-paths of the
>>>>>>
>>>>>> just-threaded path.  By removing the common prefix of blocks in upcoming
>>>>>>
>>>>>> threadable paths, and then rewiring first non-common block
>>>>>> appropriately, we expose new threading opportunities, since we are no
>>>>>> longer starting from the same BB.  We also simplify the would-be
>>>>>> threaded paths, because we don't duplicate already duplicated paths.
>> [snip]
>>> Hi,
>>>
>>> I've noticed a regression on aarch64:
>>> FAIL: gcc.dg/tree-ssa/ssa-dom-thread-7.c scan-tree-dump thread3 "Jumps
>>> threaded:Â 3"
>>> very likely caused by this patch (appeared between 262282 and 262294)
>>>
>>> Christophe
>>
>> The test needs to be adjusted here.
>>
>> The long story is that the aarch64 IL is different at thread3 time in
>> that it has 2 profitable sub-paths that can now be threaded with my
>> patch. This is causing the threaded count to be 5 for aarch64, versus 3
>> for x86 64. Previously we couldn't thread these in aarch64, so the
>> backwards threader would bail.
>>
>> One can see the different threading opportunities by sticking
>> debug_all_paths() at the top of thread_through_all_blocks(). You will
>> notice that aarch64 has far more candidates to begin with. The IL on
>> the x86 backend, has no paths that start on the same BB. The aarch64,
>> on the other hand, has many to choose from:
>>
>> path: 52 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 11,
>> path: 51 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 16,
>> path: 53 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 16,
>> path: 52 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 11, 11 -> 35,
>> path: 51 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 11, 11 -> 35,
>> path: 53 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 11, 11 -> 35,
>> path: 52 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 16, 16 -> 17,
>> path: 51 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 16, 16 -> 17,
>> path: 53 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 16, 16 -> 19,
>>
>> Some of these prove unprofitable, but 2 more than before are profitable now.
>>
>>
>> BTW, I see another threading related failure on aarch64 which is
>> unrelated to my patch, and was previously there:
>>
>> FAIL: gcc.dg/tree-ssa/ssa-dom-thread-7.c scan-tree-dump-not vrp2 "Jumps
>> threaded"
>>
>> This is probably another IL incompatibility between architectures.
>>
>> Anyways... the attached path fixes the regression. I have added a note
>> to the test explaining the IL differences. We really should rewrite all
>> the threading tests (I am NOT volunteering ;-)).
>>
>> OK for trunk?
>> Aldy
>>
>> curr.patch
>>
>>
>> gcc/testsuite/
>>
>> * gcc.dg/tree-ssa/ssa-dom-thread-7.c: Adjust test because aarch64
>> has a slightly different IL that provides more threading
>> opportunities.
> OK.
>
> WRT rewriting the tests. I'd certainly agree that we don't have the
> right set of knobs to allow us to characterize the target nor do we have
> the right dumping/scanning facilities to describe and query the CFG changes.
>
> The fact that the IL changes so much across targets is a sign that
> target dependency (probably BRANCH_COST) is twiddling the gimple we
> generate. I strongly suspect we'd be a lot better off if we tackled the
> BRANCH_COST problem first.
Huh. I've always accepted differing IL between architectures as a
necessary evil for things like auto-vectorization and the like.
What's the ideal plan here? A knob to set default values for target
dependent variables that can affect IL layout? Then we could pass
-fthis-is-an-IL-test and things be normalized?
Thanks.
Aldy
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [patch] jump threading multiple paths that start from the same BB
2018-07-04 8:12 ` Aldy Hernandez
@ 2018-07-04 9:03 ` Richard Biener
2018-07-05 19:27 ` Jeff Law
1 sibling, 0 replies; 17+ messages in thread
From: Richard Biener @ 2018-07-04 9:03 UTC (permalink / raw)
To: Aldy Hernandez; +Cc: Jeff Law, christophe.lyon, GCC Patches
On Wed, Jul 4, 2018 at 10:12 AM Aldy Hernandez <aldyh@redhat.com> wrote:
>
>
>
> On 07/03/2018 08:16 PM, Jeff Law wrote:
> > On 07/03/2018 03:31 AM, Aldy Hernandez wrote:
> >> On 07/02/2018 07:08 AM, Christophe Lyon wrote:
> >>
> >>>>> On 11/07/2017 10:33 AM, Aldy Hernandez wrote:
> >>>>>> While poking around in the backwards threader I noticed that we bail if
> >>>>>>
> >>>>>> we have already seen a starting BB.
> >>>>>>
> >>>>>> /* Do not jump-thread twice from the same block. */
> >>>>>> if (bitmap_bit_p (threaded_blocks, entry->src->index)
> >>>>>>
> >>>>>> This limitation discards paths that are sub-paths of paths that have
> >>>>>> already been threaded.
> >>>>>>
> >>>>>> The following patch scans the remaining to-be-threaded paths to identify
> >>>>>>
> >>>>>> if any of them start from the same point, and are thus sub-paths of the
> >>>>>>
> >>>>>> just-threaded path. By removing the common prefix of blocks in upcoming
> >>>>>>
> >>>>>> threadable paths, and then rewiring first non-common block
> >>>>>> appropriately, we expose new threading opportunities, since we are no
> >>>>>> longer starting from the same BB. We also simplify the would-be
> >>>>>> threaded paths, because we don't duplicate already duplicated paths.
> >> [snip]
> >>> Hi,
> >>>
> >>> I've noticed a regression on aarch64:
> >>> FAIL: gcc.dg/tree-ssa/ssa-dom-thread-7.c scan-tree-dump thread3 "Jumps
> >>> threaded: 3"
> >>> very likely caused by this patch (appeared between 262282 and 262294)
> >>>
> >>> Christophe
> >>
> >> The test needs to be adjusted here.
> >>
> >> The long story is that the aarch64 IL is different at thread3 time in
> >> that it has 2 profitable sub-paths that can now be threaded with my
> >> patch. This is causing the threaded count to be 5 for aarch64, versus 3
> >> for x86 64. Previously we couldn't thread these in aarch64, so the
> >> backwards threader would bail.
> >>
> >> One can see the different threading opportunities by sticking
> >> debug_all_paths() at the top of thread_through_all_blocks(). You will
> >> notice that aarch64 has far more candidates to begin with. The IL on
> >> the x86 backend, has no paths that start on the same BB. The aarch64,
> >> on the other hand, has many to choose from:
> >>
> >> path: 52 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 11,
> >> path: 51 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 16,
> >> path: 53 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 16,
> >> path: 52 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 11, 11 -> 35,
> >> path: 51 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 11, 11 -> 35,
> >> path: 53 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 11, 11 -> 35,
> >> path: 52 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 16, 16 -> 17,
> >> path: 51 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 16, 16 -> 17,
> >> path: 53 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 16, 16 -> 19,
> >>
> >> Some of these prove unprofitable, but 2 more than before are profitable now.
> >>
> >>
> >> BTW, I see another threading related failure on aarch64 which is
> >> unrelated to my patch, and was previously there:
> >>
> >> FAIL: gcc.dg/tree-ssa/ssa-dom-thread-7.c scan-tree-dump-not vrp2 "Jumps
> >> threaded"
> >>
> >> This is probably another IL incompatibility between architectures.
> >>
> >> Anyways... the attached path fixes the regression. I have added a note
> >> to the test explaining the IL differences. We really should rewrite all
> >> the threading tests (I am NOT volunteering ;-)).
> >>
> >> OK for trunk?
> >> Aldy
> >>
> >> curr.patch
> >>
> >>
> >> gcc/testsuite/
> >>
> >> * gcc.dg/tree-ssa/ssa-dom-thread-7.c: Adjust test because aarch64
> >> has a slightly different IL that provides more threading
> >> opportunities.
> > OK.
> >
> > WRT rewriting the tests. I'd certainly agree that we don't have the
> > right set of knobs to allow us to characterize the target nor do we have
> > the right dumping/scanning facilities to describe and query the CFG changes.
> >
> > The fact that the IL changes so much across targets is a sign that
> > target dependency (probably BRANCH_COST) is twiddling the gimple we
> > generate. I strongly suspect we'd be a lot better off if we tackled the
> > BRANCH_COST problem first.
>
> Huh. I've always accepted differing IL between architectures as a
> necessary evil for things like auto-vectorization and the like.
>
> What's the ideal plan here? A knob to set default values for target
> dependent variables that can affect IL layout? Then we could pass
> -fthis-is-an-IL-test and things be normalized?
Use gimple testcases. It's currently a bit awkward in some cases
but it worked for me in a few cases.
Richard.
> Thanks.
> Aldy
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [patch] jump threading multiple paths that start from the same BB
2018-07-04 8:12 ` Aldy Hernandez
2018-07-04 9:03 ` Richard Biener
@ 2018-07-05 19:27 ` Jeff Law
2018-07-09 7:19 ` Aldy Hernandez
1 sibling, 1 reply; 17+ messages in thread
From: Jeff Law @ 2018-07-05 19:27 UTC (permalink / raw)
To: Aldy Hernandez, Christophe Lyon; +Cc: gcc Patches
On 07/04/2018 02:12 AM, Aldy Hernandez wrote:
>
>
> On 07/03/2018 08:16 PM, Jeff Law wrote:
>> On 07/03/2018 03:31 AM, Aldy Hernandez wrote:
>>> On 07/02/2018 07:08 AM, Christophe Lyon wrote:
>>>
>>>>>> On 11/07/2017 10:33 AM, Aldy Hernandez wrote:
>>>>>>> While poking around in the backwards threader I noticed that we bail if
>>>>>>>
>>>>>>>
>>>>>>> we have already seen a starting BB.
>>>>>>>
>>>>>>>          /* Do not jump-thread twice from the same block.  */
>>>>>>>          if (bitmap_bit_p (threaded_blocks, entry->src->index)
>>>>>>>
>>>>>>> This limitation discards paths that are sub-paths of paths that have
>>>>>>> already been threaded.
>>>>>>>
>>>>>>> The following patch scans the remaining to-be-threaded paths to identify
>>>>>>>
>>>>>>>
>>>>>>> if any of them start from the same point, and are thus sub-paths of the
>>>>>>>
>>>>>>>
>>>>>>> just-threaded path.  By removing the common prefix of blocks in upcoming
>>>>>>>
>>>>>>>
>>>>>>> threadable paths, and then rewiring first non-common block
>>>>>>> appropriately, we expose new threading opportunities, since we are no
>>>>>>>
>>>>>>> longer starting from the same BB.  We also simplify the would-be
>>>>>>> threaded paths, because we don't duplicate already duplicated paths.
>>> [snip]
>>>> Hi,
>>>>
>>>> I've noticed a regression on aarch64:
>>>> FAIL: gcc.dg/tree-ssa/ssa-dom-thread-7.c scan-tree-dump thread3 "Jumps
>>>> threaded:Â 3"
>>>> very likely caused by this patch (appeared between 262282 and 262294)
>>>>
>>>> Christophe
>>>
>>> The test needs to be adjusted here.
>>>
>>> The long story is that the aarch64 IL is different at thread3 time in
>>> that it has 2 profitable sub-paths that can now be threaded with my
>>> patch. This is causing the threaded count to be 5 for aarch64, versus 3
>>> for x86 64. Previously we couldn't thread these in aarch64, so the
>>> backwards threader would bail.
>>>
>>> One can see the different threading opportunities by sticking
>>> debug_all_paths() at the top of thread_through_all_blocks(). You will
>>> notice that aarch64 has far more candidates to begin with. The IL on
>>> the x86 backend, has no paths that start on the same BB. The aarch64,
>>> on the other hand, has many to choose from:
>>>
>>> path: 52 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 11,
>>> path: 51 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 16,
>>> path: 53 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 16,
>>> path: 52 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 11, 11 -> 35,
>>> path: 51 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 11, 11 -> 35,
>>> path: 53 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 11, 11 -> 35,
>>> path: 52 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 16, 16 -> 17,
>>> path: 51 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 16, 16 -> 17,
>>> path: 53 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 16, 16 -> 19,
>>>
>>> Some of these prove unprofitable, but 2 more than before are profitable now.
>>>
>>>
>>>
>>> BTW, I see another threading related failure on aarch64 which is
>>> unrelated to my patch, and was previously there:
>>>
>>> FAIL: gcc.dg/tree-ssa/ssa-dom-thread-7.c scan-tree-dump-not vrp2 "Jumps
>>> threaded"
>>>
>>> This is probably another IL incompatibility between architectures.
>>>
>>> Anyways... the attached path fixes the regression. I have added a note
>>> to the test explaining the IL differences. We really should rewrite all
>>> the threading tests (I am NOT volunteering ;-)).
>>>
>>> OK for trunk?
>>> Aldy
>>>
>>> curr.patch
>>>
>>>
>>> gcc/testsuite/
>>>
>>> Â Â Â Â * gcc.dg/tree-ssa/ssa-dom-thread-7.c: Adjust test because aarch64
>>> Â Â Â Â has a slightly different IL that provides more threading
>>> Â Â Â Â opportunities.
>> OK.
>>
>> WRT rewriting the tests. I'd certainly agree that we don't have the
>> right set of knobs to allow us to characterize the target nor do we have
>> the right dumping/scanning facilities to describe and query the CFG
>> changes.
>>
>> The fact that the IL changes so much across targets is a sign that
>> target dependency (probably BRANCH_COST) is twiddling the gimple we
>> generate. I strongly suspect we'd be a lot better off if we tackled the
>> BRANCH_COST problem first.
>
> Huh. I've always accepted differing IL between architectures as a
> necessary evil for things like auto-vectorization and the like.
Yes. We've made a conscious decision that introducing target
dependencies for the autovectorizer makes sense. BRANCH_COST on the
other hand is a different beast :-)
>
> What's the ideal plan here? A knob to set default values for target
> dependent variables that can affect IL layout? Then we could pass
> -fthis-is-an-IL-test and things be normalized?
Well, lots of things.
I'd like decisions about how to expand branches deferred until rtl
expansion. Kai was poking at this in the past but never really got any
traction.
Many tests should turn into gimple IL tests.
And I'd like a better framework for testing what we're doing to the IL.
Jeff
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [patch] jump threading multiple paths that start from the same BB
2018-07-05 19:27 ` Jeff Law
@ 2018-07-09 7:19 ` Aldy Hernandez
2018-07-09 8:33 ` Richard Biener
2018-07-09 19:56 ` Jeff Law
0 siblings, 2 replies; 17+ messages in thread
From: Aldy Hernandez @ 2018-07-09 7:19 UTC (permalink / raw)
To: Jeff Law, Christophe Lyon; +Cc: gcc Patches
On 07/05/2018 03:27 PM, Jeff Law wrote:
> On 07/04/2018 02:12 AM, Aldy Hernandez wrote:
>>
>>
>> On 07/03/2018 08:16 PM, Jeff Law wrote:
>>> On 07/03/2018 03:31 AM, Aldy Hernandez wrote:
>>>> On 07/02/2018 07:08 AM, Christophe Lyon wrote:
>>>>
>>>>>>> On 11/07/2017 10:33 AM, Aldy Hernandez wrote:
>>>>>>>> While poking around in the backwards threader I noticed that we bail if
>>>>>>>>
>>>>>>>>
>>>>>>>> we have already seen a starting BB.
>>>>>>>>
>>>>>>>>          /* Do not jump-thread twice from the same block.  */
>>>>>>>>          if (bitmap_bit_p (threaded_blocks, entry->src->index)
>>>>>>>>
>>>>>>>> This limitation discards paths that are sub-paths of paths that have
>>>>>>>> already been threaded.
>>>>>>>>
>>>>>>>> The following patch scans the remaining to-be-threaded paths to identify
>>>>>>>>
>>>>>>>>
>>>>>>>> if any of them start from the same point, and are thus sub-paths of the
>>>>>>>>
>>>>>>>>
>>>>>>>> just-threaded path.  By removing the common prefix of blocks in upcoming
>>>>>>>>
>>>>>>>>
>>>>>>>> threadable paths, and then rewiring first non-common block
>>>>>>>> appropriately, we expose new threading opportunities, since we are no
>>>>>>>>
>>>>>>>> longer starting from the same BB.  We also simplify the would-be
>>>>>>>> threaded paths, because we don't duplicate already duplicated paths.
>>>> [snip]
>>>>> Hi,
>>>>>
>>>>> I've noticed a regression on aarch64:
>>>>> FAIL: gcc.dg/tree-ssa/ssa-dom-thread-7.c scan-tree-dump thread3 "Jumps
>>>>> threaded:Â 3"
>>>>> very likely caused by this patch (appeared between 262282 and 262294)
>>>>>
>>>>> Christophe
>>>>
>>>> The test needs to be adjusted here.
>>>>
>>>> The long story is that the aarch64 IL is different at thread3 time in
>>>> that it has 2 profitable sub-paths that can now be threaded with my
>>>> patch. This is causing the threaded count to be 5 for aarch64, versus 3
>>>> for x86 64. Previously we couldn't thread these in aarch64, so the
>>>> backwards threader would bail.
>>>>
>>>> One can see the different threading opportunities by sticking
>>>> debug_all_paths() at the top of thread_through_all_blocks(). You will
>>>> notice that aarch64 has far more candidates to begin with. The IL on
>>>> the x86 backend, has no paths that start on the same BB. The aarch64,
>>>> on the other hand, has many to choose from:
>>>>
>>>> path: 52 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 11,
>>>> path: 51 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 16,
>>>> path: 53 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 16,
>>>> path: 52 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 11, 11 -> 35,
>>>> path: 51 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 11, 11 -> 35,
>>>> path: 53 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 11, 11 -> 35,
>>>> path: 52 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 16, 16 -> 17,
>>>> path: 51 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 16, 16 -> 17,
>>>> path: 53 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 16, 16 -> 19,
>>>>
>>>> Some of these prove unprofitable, but 2 more than before are profitable now.
>>>>
>>>>
>>>>
>>>> BTW, I see another threading related failure on aarch64 which is
>>>> unrelated to my patch, and was previously there:
>>>>
>>>> FAIL: gcc.dg/tree-ssa/ssa-dom-thread-7.c scan-tree-dump-not vrp2 "Jumps
>>>> threaded"
>>>>
>>>> This is probably another IL incompatibility between architectures.
>>>>
>>>> Anyways... the attached path fixes the regression. I have added a note
>>>> to the test explaining the IL differences. We really should rewrite all
>>>> the threading tests (I am NOT volunteering ;-)).
>>>>
>>>> OK for trunk?
>>>> Aldy
>>>>
>>>> curr.patch
>>>>
>>>>
>>>> gcc/testsuite/
>>>>
>>>> Â Â Â Â * gcc.dg/tree-ssa/ssa-dom-thread-7.c: Adjust test because aarch64
>>>> Â Â Â Â has a slightly different IL that provides more threading
>>>> Â Â Â Â opportunities.
>>> OK.
>>>
>>> WRT rewriting the tests. I'd certainly agree that we don't have the
>>> right set of knobs to allow us to characterize the target nor do we have
>>> the right dumping/scanning facilities to describe and query the CFG
>>> changes.
>>>
>>> The fact that the IL changes so much across targets is a sign that
>>> target dependency (probably BRANCH_COST) is twiddling the gimple we
>>> generate. I strongly suspect we'd be a lot better off if we tackled the
>>> BRANCH_COST problem first.
>>
>> Huh. I've always accepted differing IL between architectures as a
>> necessary evil for things like auto-vectorization and the like.
> Yes. We've made a conscious decision that introducing target
> dependencies for the autovectorizer makes sense. BRANCH_COST on the
> other hand is a different beast :-)
>
>>
>> What's the ideal plan here? A knob to set default values for target
>> dependent variables that can affect IL layout? Then we could pass
>> -fthis-is-an-IL-test and things be normalized?
> Well, lots of things.
>
> I'd like decisions about how to expand branches deferred until rtl
> expansion. Kai was poking at this in the past but never really got any
> traction.
For the record, the problem in this testcase is that switch lowering is
riddled with back end specific knowledge (GET_MODE_SIZE uses as well as
some rtx cost hacks).
>
> Many tests should turn into gimple IL tests.
Yeah, though for tests like the threading ones, they're already
sufficiently convoluted that turning them into gimple IL tests will make
them even harder to read. Oh well, I guess?
To make it easier to transition to gimple IL tests, I suppose we could
add -fdump-tree-*-something-more-suitable-for-gimplefe ;-). Is it by
design that the gimple fe is sufficiently different from what we dump
for -fdump-tree (ahem, things like phi arguments, probabilities, etc)?
>
> And I'd like a better framework for testing what we're doing to the IL.
Sigh. Me too.
Aldy
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [patch] jump threading multiple paths that start from the same BB
2018-07-09 7:19 ` Aldy Hernandez
@ 2018-07-09 8:33 ` Richard Biener
2018-07-09 8:47 ` Aldy Hernandez
2018-07-09 19:56 ` Jeff Law
1 sibling, 1 reply; 17+ messages in thread
From: Richard Biener @ 2018-07-09 8:33 UTC (permalink / raw)
To: Aldy Hernandez; +Cc: Jeff Law, Christophe Lyon, GCC Patches
On Mon, Jul 9, 2018 at 9:19 AM Aldy Hernandez <aldyh@redhat.com> wrote:
>
>
>
> On 07/05/2018 03:27 PM, Jeff Law wrote:
> > On 07/04/2018 02:12 AM, Aldy Hernandez wrote:
> >>
> >>
> >> On 07/03/2018 08:16 PM, Jeff Law wrote:
> >>> On 07/03/2018 03:31 AM, Aldy Hernandez wrote:
> >>>> On 07/02/2018 07:08 AM, Christophe Lyon wrote:
> >>>>
> >>>>>>> On 11/07/2017 10:33 AM, Aldy Hernandez wrote:
> >>>>>>>> While poking around in the backwards threader I noticed that we bail if
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> we have already seen a starting BB.
> >>>>>>>>
> >>>>>>>> /* Do not jump-thread twice from the same block. */
> >>>>>>>> if (bitmap_bit_p (threaded_blocks, entry->src->index)
> >>>>>>>>
> >>>>>>>> This limitation discards paths that are sub-paths of paths that have
> >>>>>>>> already been threaded.
> >>>>>>>>
> >>>>>>>> The following patch scans the remaining to-be-threaded paths to identify
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> if any of them start from the same point, and are thus sub-paths of the
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> just-threaded path. By removing the common prefix of blocks in upcoming
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> threadable paths, and then rewiring first non-common block
> >>>>>>>> appropriately, we expose new threading opportunities, since we are no
> >>>>>>>>
> >>>>>>>> longer starting from the same BB. We also simplify the would-be
> >>>>>>>> threaded paths, because we don't duplicate already duplicated paths.
> >>>> [snip]
> >>>>> Hi,
> >>>>>
> >>>>> I've noticed a regression on aarch64:
> >>>>> FAIL: gcc.dg/tree-ssa/ssa-dom-thread-7.c scan-tree-dump thread3 "Jumps
> >>>>> threaded: 3"
> >>>>> very likely caused by this patch (appeared between 262282 and 262294)
> >>>>>
> >>>>> Christophe
> >>>>
> >>>> The test needs to be adjusted here.
> >>>>
> >>>> The long story is that the aarch64 IL is different at thread3 time in
> >>>> that it has 2 profitable sub-paths that can now be threaded with my
> >>>> patch. This is causing the threaded count to be 5 for aarch64, versus 3
> >>>> for x86 64. Previously we couldn't thread these in aarch64, so the
> >>>> backwards threader would bail.
> >>>>
> >>>> One can see the different threading opportunities by sticking
> >>>> debug_all_paths() at the top of thread_through_all_blocks(). You will
> >>>> notice that aarch64 has far more candidates to begin with. The IL on
> >>>> the x86 backend, has no paths that start on the same BB. The aarch64,
> >>>> on the other hand, has many to choose from:
> >>>>
> >>>> path: 52 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 11,
> >>>> path: 51 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 16,
> >>>> path: 53 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 16,
> >>>> path: 52 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 11, 11 -> 35,
> >>>> path: 51 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 11, 11 -> 35,
> >>>> path: 53 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 11, 11 -> 35,
> >>>> path: 52 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 16, 16 -> 17,
> >>>> path: 51 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 16, 16 -> 17,
> >>>> path: 53 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 16, 16 -> 19,
> >>>>
> >>>> Some of these prove unprofitable, but 2 more than before are profitable now.
> >>>>
> >>>>
> >>>>
> >>>> BTW, I see another threading related failure on aarch64 which is
> >>>> unrelated to my patch, and was previously there:
> >>>>
> >>>> FAIL: gcc.dg/tree-ssa/ssa-dom-thread-7.c scan-tree-dump-not vrp2 "Jumps
> >>>> threaded"
> >>>>
> >>>> This is probably another IL incompatibility between architectures.
> >>>>
> >>>> Anyways... the attached path fixes the regression. I have added a note
> >>>> to the test explaining the IL differences. We really should rewrite all
> >>>> the threading tests (I am NOT volunteering ;-)).
> >>>>
> >>>> OK for trunk?
> >>>> Aldy
> >>>>
> >>>> curr.patch
> >>>>
> >>>>
> >>>> gcc/testsuite/
> >>>>
> >>>> * gcc.dg/tree-ssa/ssa-dom-thread-7.c: Adjust test because aarch64
> >>>> has a slightly different IL that provides more threading
> >>>> opportunities.
> >>> OK.
> >>>
> >>> WRT rewriting the tests. I'd certainly agree that we don't have the
> >>> right set of knobs to allow us to characterize the target nor do we have
> >>> the right dumping/scanning facilities to describe and query the CFG
> >>> changes.
> >>>
> >>> The fact that the IL changes so much across targets is a sign that
> >>> target dependency (probably BRANCH_COST) is twiddling the gimple we
> >>> generate. I strongly suspect we'd be a lot better off if we tackled the
> >>> BRANCH_COST problem first.
> >>
> >> Huh. I've always accepted differing IL between architectures as a
> >> necessary evil for things like auto-vectorization and the like.
> > Yes. We've made a conscious decision that introducing target
> > dependencies for the autovectorizer makes sense. BRANCH_COST on the
> > other hand is a different beast :-)
> >
> >>
> >> What's the ideal plan here? A knob to set default values for target
> >> dependent variables that can affect IL layout? Then we could pass
> >> -fthis-is-an-IL-test and things be normalized?
> > Well, lots of things.
> >
> > I'd like decisions about how to expand branches deferred until rtl
> > expansion. Kai was poking at this in the past but never really got any
> > traction.
>
> For the record, the problem in this testcase is that switch lowering is
> riddled with back end specific knowledge (GET_MODE_SIZE uses as well as
> some rtx cost hacks).
>
> >
> > Many tests should turn into gimple IL tests.
>
> Yeah, though for tests like the threading ones, they're already
> sufficiently convoluted that turning them into gimple IL tests will make
> them even harder to read. Oh well, I guess?
>
> To make it easier to transition to gimple IL tests, I suppose we could
> add -fdump-tree-*-something-more-suitable-for-gimplefe ;-).
There is -fdump-tree-*-gimple ;)
> Is it by
> design that the gimple fe is sufficiently different from what we dump
> for -fdump-tree (ahem, things like phi arguments, probabilities, etc)?
Well. The only reason is that I didnt' want to adjust all testcases so
I introduced the -gimple dump modifier...
Note there's still some "magic" fiddling needed to make the GIMPLE FE
grok -gimple dump files as well as working around limitations with the
current way of having SSA/CFG testcases.
> >
> > And I'd like a better framework for testing what we're doing to the IL.
>
> Sigh. Me too.
Well, the GIMPLE FE is supposed to be that - it's just not perfect
(or rather incomplete).
Richard.
> Aldy
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [patch] jump threading multiple paths that start from the same BB
2018-07-09 8:33 ` Richard Biener
@ 2018-07-09 8:47 ` Aldy Hernandez
0 siblings, 0 replies; 17+ messages in thread
From: Aldy Hernandez @ 2018-07-09 8:47 UTC (permalink / raw)
To: Richard Biener; +Cc: Jeff Law, Christophe Lyon, GCC Patches
On 07/09/2018 04:29 AM, Richard Biener wrote:
> On Mon, Jul 9, 2018 at 9:19 AM Aldy Hernandez <aldyh@redhat.com> wrote:
>>
>>
>>
>> On 07/05/2018 03:27 PM, Jeff Law wrote:
>>> On 07/04/2018 02:12 AM, Aldy Hernandez wrote:
>>> Many tests should turn into gimple IL tests.
>>
>> Yeah, though for tests like the threading ones, they're already
>> sufficiently convoluted that turning them into gimple IL tests will make
>> them even harder to read. Oh well, I guess?
>>
>> To make it easier to transition to gimple IL tests, I suppose we could
>> add -fdump-tree-*-something-more-suitable-for-gimplefe ;-).
>
> There is -fdump-tree-*-gimple ;)
Ahh!!! Thanks.
>
>> Is it by
>> design that the gimple fe is sufficiently different from what we dump
>> for -fdump-tree (ahem, things like phi arguments, probabilities, etc)?
>
> Well. The only reason is that I didnt' want to adjust all testcases so
> I introduced the -gimple dump modifier...
>
> Note there's still some "magic" fiddling needed to make the GIMPLE FE
> grok -gimple dump files as well as working around limitations with the
> current way of having SSA/CFG testcases.
I'll keep it in mind as I deal with threading going forward.
>
>>>
>>> And I'd like a better framework for testing what we're doing to the IL.
>>
>> Sigh. Me too.
>
> Well, the GIMPLE FE is supposed to be that - it's just not perfect
> (or rather incomplete).
Hey, it's pretty good IMO :).
Thanks.
Aldy
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [patch] jump threading multiple paths that start from the same BB
2018-07-09 7:19 ` Aldy Hernandez
2018-07-09 8:33 ` Richard Biener
@ 2018-07-09 19:56 ` Jeff Law
2018-07-10 8:43 ` Aldy Hernandez
1 sibling, 1 reply; 17+ messages in thread
From: Jeff Law @ 2018-07-09 19:56 UTC (permalink / raw)
To: Aldy Hernandez, Christophe Lyon; +Cc: gcc Patches
On 07/09/2018 01:19 AM, Aldy Hernandez wrote:
>>
>> I'd like decisions about how to expand branches deferred until rtl
>> expansion. Kai was poking at this in the past but never really got any
>> traction.
>
> For the record, the problem in this testcase is that switch lowering is
> riddled with back end specific knowledge (GET_MODE_SIZE uses as well as
> some rtx cost hacks).
Yea. Switch lowering is going to have some of these as well, though I
think BRANCH_COST is more pervasive.
>
>>
>> Many tests should turn into gimple IL tests.
>
> Yeah, though for tests like the threading ones, they're already
> sufficiently convoluted that turning them into gimple IL tests will make
> them even harder to read. Oh well, I guess?
It might make them harder to read, but it would guarantee consistent
gimple fed into the optimizer across our targets which in turn ought to
result in consistent behavior by the optimizer which in turn should
simplify the test and make them more consistent over time.
Jeff
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [patch] jump threading multiple paths that start from the same BB
2018-07-09 19:56 ` Jeff Law
@ 2018-07-10 8:43 ` Aldy Hernandez
0 siblings, 0 replies; 17+ messages in thread
From: Aldy Hernandez @ 2018-07-10 8:43 UTC (permalink / raw)
To: Jeff Law, Christophe Lyon; +Cc: gcc Patches
On 07/09/2018 03:56 PM, Jeff Law wrote:
> On 07/09/2018 01:19 AM, Aldy Hernandez wrote:
>>>
>>> I'd like decisions about how to expand branches deferred until rtl
>>> expansion. Kai was poking at this in the past but never really got any
>>> traction.
>>
>> For the record, the problem in this testcase is that switch lowering is
>> riddled with back end specific knowledge (GET_MODE_SIZE uses as well as
>> some rtx cost hacks).
> Yea. Switch lowering is going to have some of these as well, though I
> think BRANCH_COST is more pervasive.
>
>>
>>>
>>> Many tests should turn into gimple IL tests.
>>
>> Yeah, though for tests like the threading ones, they're already
>> sufficiently convoluted that turning them into gimple IL tests will make
>> them even harder to read. Oh well, I guess?
> It might make them harder to read, but it would guarantee consistent
> gimple fed into the optimizer across our targets which in turn ought to
> result in consistent behavior by the optimizer which in turn should
> simplify the test and make them more consistent over time.
Ok. When I submit my queued up range based changes to the threader I'll
see if I can convert a big chunk of the threader tests to gimple IL.
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2018-07-10 8:43 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-07 17:38 [patch] jump threading multiple paths that start from the same BB Aldy Hernandez
2017-11-27 23:31 ` Jeff Law
2017-11-29 20:24 ` Aldy Hernandez
2017-11-30 23:38 ` Jeff Law
2018-06-29 19:04 ` Jeff Law
2018-07-01 10:56 ` Aldy Hernandez
2018-07-02 11:08 ` Christophe Lyon
2018-07-03 9:31 ` Aldy Hernandez
2018-07-04 0:16 ` Jeff Law
2018-07-04 8:12 ` Aldy Hernandez
2018-07-04 9:03 ` Richard Biener
2018-07-05 19:27 ` Jeff Law
2018-07-09 7:19 ` Aldy Hernandez
2018-07-09 8:33 ` Richard Biener
2018-07-09 8:47 ` Aldy Hernandez
2018-07-09 19:56 ` Jeff Law
2018-07-10 8:43 ` Aldy Hernandez
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).