From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 118074 invoked by alias); 26 Aug 2016 13:03:53 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 118050 invoked by uid 89); 26 Aug 2016 13:03:51 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=0.6 required=5.0 tests=BAYES_50,RP_MATCHES_RCVD,SPF_HELO_PASS autolearn=ham version=3.3.2 spammy=WHICH, 926, PRO, Destroy X-HELO: mx1.redhat.com Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 26 Aug 2016 13:03:47 +0000 Received: from int-mx14.intmail.prod.int.phx2.redhat.com (int-mx14.intmail.prod.int.phx2.redhat.com [10.5.11.27]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id D7C917DD0F; Fri, 26 Aug 2016 13:03:45 +0000 (UTC) Received: from localhost.localdomain (vpn1-5-239.ams2.redhat.com [10.36.5.239]) by int-mx14.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id u7QD3idI023954; Fri, 26 Aug 2016 09:03:44 -0400 Subject: Re: [PATCH v2 0/9] Separate shrink-wrapping To: Segher Boessenkool , gcc-patches@gcc.gnu.org References: From: Bernd Schmidt Message-ID: <81710c02-05bf-fb65-dedc-8ba389c0d8e8@redhat.com> Date: Fri, 26 Aug 2016 13:03:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.0 MIME-Version: 1.0 In-Reply-To: Content-Type: multipart/mixed; boundary="------------C2DF0FBABC6D00206892E5F8" X-IsSubscribed: yes X-SW-Source: 2016-08/txt/msg01860.txt.bz2 This is a multi-part message in MIME format. --------------C2DF0FBABC6D00206892E5F8 Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Content-length: 2083 On 08/01/2016 03:42 AM, Segher Boessenkool wrote: > This is the second version. Concern was renamed to component, and all > other comments were addressed (I hope). Not really, I'm afraid. There still seems to be no detailed explanation of the algorithm. Instead, there is a vague outline (which should be expanded to at least the level of detail found e.g. in tree-ssa-pre.c), and inside the code there are still meaningless comments of the form /* Deselect those epilogue components that should not be inserted for this edge. */ which don't tell me anything about what the intention is (why should they not be inserted?). The result is that as a reader, I still find myself unable to determine whether the algorithm is correct or not. Worse, I'm still not entirely certain what it is intended to achieve: I asked for a motivating example or two, but haven't seen one in the comments anywhere. My most general question would be why the algorithm for placing individual prologue components would be different from the regular shrink-wrapping algorithm for full prologues. Examples and/or arguments relating to how this new code acts with regard to loops are also particularly needed. So, I think improvement is necessary in these points, but I also think that there's room for experimental verification by way of self-tests. If done thoroughly (testing the transformation on several sufficiently random CFGs and maybe some manually constructed ones) it would go a long way towards showing that at least we don't have to worry too much about miscompilations. That's what I've been working on, and an initial patch is below. This is incomplete and posted more as an RFD since we're getting towards the end of the week: there are gaps in the test coverage, and it currently fails the selftest. I assume the latter is a problem with my code, but it wouldn't hurt if you could take a look; maybe I misunderstood something entirely about what the separate shrink-wrapping is supposed to achieve, or maybe I messed up the algorithm with my changes. Bernd --------------C2DF0FBABC6D00206892E5F8 Content-Type: text/x-patch; name="sepsw-st4.diff" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="sepsw-st4.diff" Content-length: 33547 diff '--width=118' -x .svn -dru sw-selftest-base/gcc/sbitmap.c with-selftest/gcc/sbitmap.c --- sw-selftest-base/gcc/sbitmap.c 2016-04-29 10:46:49.189700651 +0200 +++ with-selftest/gcc/sbitmap.c 2016-08-26 11:40:05.043374153 +0200 @@ -319,7 +319,7 @@ *dstp++ = *ap++; } -/* Return true if there are any bits set in A are also set in B. +/* Return true if there are any bits set in A which are also set in B. Return false otherwise. */ bool @@ -335,6 +335,24 @@ return true; return false; +} + +/* Return true if there are any bits set in A which are not set in B. + Return false otherwise. */ + +bool +bitmap_intersect_compl_p (const_sbitmap a, const_sbitmap b) +{ + const_sbitmap_ptr ap = a->elms; + const_sbitmap_ptr bp = b->elms; + unsigned int i, n; + + n = MIN (a->size, b->size); + for (i = 0; i < n; i++) + if ((*ap++ & ~*bp++) != 0) + return true; + + return false; } /* Set DST to be (A and B). diff '--width=118' -x .svn -dru sw-selftest-base/gcc/sbitmap.h with-selftest/gcc/sbitmap.h --- sw-selftest-base/gcc/sbitmap.h 2016-08-01 12:42:29.678435973 +0200 +++ with-selftest/gcc/sbitmap.h 2016-08-26 11:40:14.520040887 +0200 @@ -246,6 +246,7 @@ extern bool bitmap_and_or (sbitmap, const_sbitmap, const_sbitmap, const_sbitmap); extern bool bitmap_intersect_p (const_sbitmap, const_sbitmap); +extern bool bitmap_intersect_compl_p (const_sbitmap, const_sbitmap); extern bool bitmap_and (sbitmap, const_sbitmap, const_sbitmap); extern bool bitmap_ior (sbitmap, const_sbitmap, const_sbitmap); extern bool bitmap_xor (sbitmap, const_sbitmap, const_sbitmap); diff '--width=118' -x .svn -dru sw-selftest-base/gcc/selftest.h with-selftest/gcc/selftest.h --- sw-selftest-base/gcc/selftest.h 2016-08-01 12:42:57.738436176 +0200 +++ with-selftest/gcc/selftest.h 2016-08-26 11:49:50.656711729 +0200 @@ -92,6 +92,11 @@ extern void tree_cfg_c_tests (); extern void vec_c_tests (); extern void wide_int_cc_tests (); +extern void shrink_wrapping_separate_tests (); + +/* Helper functions to set up certain tests. */ +extern basic_block build_random_cfg (int); +extern void free_random_cfg (); extern int num_passes; diff '--width=118' -x .svn -dru sw-selftest-base/gcc/selftest-run-tests.c with-selftest/gcc/selftest-run-tests.c --- sw-selftest-base/gcc/selftest-run-tests.c 2016-08-01 12:43:07.411769580 +0200 +++ with-selftest/gcc/selftest-run-tests.c 2016-08-26 11:50:34.160045378 +0200 @@ -67,8 +67,9 @@ spellcheck_tree_c_tests (); tree_cfg_c_tests (); - /* This one relies on most of the above. */ + /* These ones relies on most of the above. */ function_tests_c_tests (); + shrink_wrapping_separate_tests (); /* Finished running tests. */ long finish_time = get_run_time (); diff '--width=118' -x .svn -dru sw-selftest-base/gcc/shrink-wrap.c with-selftest/gcc/shrink-wrap.c --- sw-selftest-base/gcc/shrink-wrap.c 2016-08-25 13:50:11.013943584 +0200 +++ with-selftest/gcc/shrink-wrap.c 2016-08-26 13:25:48.843420128 +0200 @@ -41,7 +41,7 @@ #include "regcprop.h" #include "rtl-iter.h" #include "valtrack.h" - +#include "selftest.h" /* Return true if INSN requires the stack frame to be set up. PROLOGUE_USED contains the hard registers used in the function @@ -1091,6 +1091,58 @@ gcov_type total_cost; }; +/* An abstract base class which holds the implementation of the algorithm. + Derived classes exist for actual shrink-wrapping, and for self-test. */ +class sw_sep_base +{ + virtual sbitmap components_for_bb (basic_block) = 0; + virtual void disqualify (edge, sbitmap, bool) = 0; + virtual void emit_prologue_into_bb (basic_block, sbitmap, bool) = 0; + virtual void emit_epilogue_into_bb (basic_block, sbitmap, bool) = 0; + virtual void emit_prologue_on_edge (edge, sbitmap) = 0; + virtual void emit_epilogue_on_edge (edge, sbitmap) = 0; + virtual void set_handled () = 0; + + void init (); + void place_prologue_for_one_component (unsigned int, basic_block); + void spread_components (basic_block head); + void disqualify_problematic_components (); + void emit_common_heads_for_components (); + void emit_common_tails_for_components (); + void insert_prologue_epilogue_for_components (); + + protected: + basic_block m_first_bb; + sbitmap m_components; + + sw_sep_base (basic_block first_bb, sbitmap components) + : m_first_bb (first_bb), m_components (components) + { + } + + public: + virtual void fini (); + sbitmap run (); +}; + +/* The derived class that performs the real optimization. */ +class sw_sep : public sw_sep_base +{ + sbitmap components_for_bb (basic_block); + void disqualify (edge, sbitmap, bool); + void emit_prologue_into_bb (basic_block, sbitmap, bool); + void emit_epilogue_into_bb (basic_block, sbitmap, bool); + void emit_prologue_on_edge (edge, sbitmap); + void emit_epilogue_on_edge (edge, sbitmap); + void set_handled (); + + public: + sw_sep (basic_block first_bb, sbitmap components) + : sw_sep_base (first_bb, components) + { + } +}; + /* A helper function for accessing the pass-specific info. */ static inline struct sw * SW (basic_block bb) @@ -1101,26 +1153,19 @@ /* Create the pass-specific data structures for separately shrink-wrapping with components COMPONENTS. */ -static void -init_separate_shrink_wrap (sbitmap components) +void +sw_sep_base::init () { basic_block bb; FOR_ALL_BB_FN (bb, cfun) { bb->aux = xcalloc (1, sizeof (struct sw)); - SW (bb)->needs_components = targetm.shrink_wrap.components_for_bb (bb); - - if (dump_file) - { - fprintf (dump_file, "bb %d components:", bb->index); - dump_components ("has", SW (bb)->needs_components); - fprintf (dump_file, "\n"); - } + SW (bb)->needs_components = components_for_bb (bb); - SW (bb)->has_components = sbitmap_alloc (SBITMAP_SIZE (components)); - SW (bb)->head_components = sbitmap_alloc (SBITMAP_SIZE (components)); - SW (bb)->tail_components = sbitmap_alloc (SBITMAP_SIZE (components)); + SW (bb)->has_components = sbitmap_alloc (SBITMAP_SIZE (m_components)); + SW (bb)->head_components = sbitmap_alloc (SBITMAP_SIZE (m_components)); + SW (bb)->tail_components = sbitmap_alloc (SBITMAP_SIZE (m_components)); bitmap_clear (SW (bb)->has_components); bitmap_clear (SW (bb)->head_components); bitmap_clear (SW (bb)->tail_components); @@ -1128,8 +1173,8 @@ } /* Destroy the pass-specific data. */ -static void -fini_separate_shrink_wrap (void) +void +sw_sep_base::fini (void) { basic_block bb; FOR_ALL_BB_FN (bb, cfun) @@ -1144,13 +1189,132 @@ } } +sbitmap +sw_sep::components_for_bb (basic_block bb) +{ + sbitmap b = targetm.shrink_wrap.components_for_bb (bb); + + if (dump_file) + { + fprintf (dump_file, "bb %d components:", bb->index); + dump_components ("has", b); + fprintf (dump_file, "\n"); + } + return b; +} + +void +sw_sep::disqualify (edge e, sbitmap edge_components, bool is_prologue) +{ + targetm.shrink_wrap.disqualify_components (m_components, e, edge_components, + is_prologue); +} + +/* Call the target hook to emit epilogue components for EPI, and place + the insns on edge E. */ +void +sw_sep::emit_epilogue_on_edge (edge e, sbitmap epi) +{ + start_sequence (); + targetm.shrink_wrap.emit_epilogue_components (epi); + rtx_insn *seq = get_insns (); + end_sequence (); + + if (e->flags & EDGE_SIBCALL) + { + gcc_assert (e->dest == EXIT_BLOCK_PTR_FOR_FN (cfun)); + + rtx_insn *insn = BB_END (e->src); + gcc_assert (CALL_P (insn) && SIBLING_CALL_P (insn)); + emit_insn_before (seq, insn); + } + else if (e->dest == EXIT_BLOCK_PTR_FOR_FN (cfun)) + { + gcc_assert (e->flags & EDGE_FALLTHRU); + basic_block new_bb = split_edge (e); + emit_insn_after (seq, BB_END (new_bb)); + } + else + insert_insn_on_edge (seq, e); +} + +/* Call the target hook to emit prologue components for EPI, and place + the insns on edge E. */ +void +sw_sep::emit_prologue_on_edge (edge e, sbitmap pro) +{ + start_sequence (); + targetm.shrink_wrap.emit_prologue_components (pro); + rtx_insn *seq = get_insns (); + end_sequence (); + + insert_insn_on_edge (seq, e); +} + +/* Call the target hook to emit epilogue components for EPI, and place + the insns in basic block BB, at the start if AT_HEAD, at the end + otherwise. */ +void +sw_sep::emit_epilogue_into_bb (basic_block bb, sbitmap epi, bool at_head) +{ + start_sequence (); + targetm.shrink_wrap.emit_epilogue_components (epi); + rtx_insn *seq = get_insns (); + end_sequence (); + + if (at_head) + emit_insn_after (seq, bb_note (bb)); + else + { + rtx_insn *insn = BB_END (bb); + if (control_flow_insn_p (insn)) + emit_insn_before (seq, insn); + else + emit_insn_after (seq, insn); + } +} + +/* Call the target hook to emit prologue components for PRO, and place + the insns in basic block BB, at the start if AT_HEAD, at the end + otherwise. */ +void +sw_sep::emit_prologue_into_bb (basic_block bb, sbitmap pro, bool at_head) +{ + start_sequence (); + targetm.shrink_wrap.emit_prologue_components (pro); + rtx_insn *seq = get_insns (); + end_sequence (); + + if (at_head) + emit_insn_after (seq, bb_note (bb)); + else + { + rtx_insn *insn = BB_END (bb); + if (control_flow_insn_p (insn)) + emit_insn_before (seq, insn); + else + emit_insn_after (seq, insn); + } +} + +/* Called by the engine if separate shrink-wrapping was successful for + some components. */ +void +sw_sep::set_handled () +{ + targetm.shrink_wrap.set_handled_components (m_components); + + crtl->shrink_wrapped_separate = true; +} + /* Place the prologue for component WHICH, in the basic blocks dominated by HEAD. Do a DFS over the dominator tree, and set bit WHICH in the HAS_COMPONENTS of a block if either the block has that bit set in NEEDS_COMPONENTS, or it is cheaper to place the prologue here than in all dominator subtrees separately. */ -static void -place_prologue_for_one_component (unsigned int which, basic_block head) +void +sw_sep_base::place_prologue_for_one_component (unsigned int which, + basic_block head) { /* The block we are currently dealing with. */ basic_block bb = head; @@ -1246,8 +1410,8 @@ /* Mark HAS_COMPONENTS for every block dominated by at least one block with PRO_COMPONENTS set for the respective components, starting at HEAD. */ -static void -spread_components (basic_block head) +void +sw_sep_base::spread_components (basic_block head) { basic_block bb = head; bool first_visit = true; @@ -1291,11 +1455,11 @@ /* If we cannot handle placing some component's prologues or epilogues where we decided we should place them, unmark that component in COMPONENTS so that it is not wrapped separately. */ -static void -disqualify_problematic_components (sbitmap components) +void +sw_sep_base::disqualify_problematic_components () { - sbitmap pro = sbitmap_alloc (SBITMAP_SIZE (components)); - sbitmap epi = sbitmap_alloc (SBITMAP_SIZE (components)); + sbitmap pro = sbitmap_alloc (SBITMAP_SIZE (m_components)); + sbitmap epi = sbitmap_alloc (SBITMAP_SIZE (m_components)); basic_block bb; FOR_EACH_BB_FN (bb, cfun) @@ -1312,11 +1476,9 @@ /* Ask the target what it thinks about things. */ if (!bitmap_empty_p (epi)) - targetm.shrink_wrap.disqualify_components (components, e, epi, - false); + disqualify (e, epi, false); if (!bitmap_empty_p (pro)) - targetm.shrink_wrap.disqualify_components (components, e, pro, - true); + disqualify (e, pro, true); /* If this edge doesn't need splitting, we're fine. */ if (single_pred_p (e->dest) @@ -1336,10 +1498,10 @@ /* Remove from consideration those components we would need pro/epilogues for on edges where we cannot insert them. */ - bitmap_and_compl (components, components, epi); - bitmap_and_compl (components, components, pro); + bitmap_and_compl (m_components, m_components, epi); + bitmap_and_compl (m_components, m_components, pro); - if (dump_file && !bitmap_subset_p (epi, components)) + if (dump_file && !bitmap_subset_p (epi, m_components)) { fprintf (dump_file, " BAD epi %d->%d", e->src->index, e->dest->index); @@ -1349,7 +1511,7 @@ fprintf (dump_file, "\n"); } - if (dump_file && !bitmap_subset_p (pro, components)) + if (dump_file && !bitmap_subset_p (pro, m_components)) { fprintf (dump_file, " BAD pro %d->%d", e->src->index, e->dest->index); @@ -1367,12 +1529,12 @@ /* Place code for prologues and epilogues for COMPONENTS where we can put that code at the start of basic blocks. */ -static void -emit_common_heads_for_components (sbitmap components) +void +sw_sep_base::emit_common_heads_for_components () { - sbitmap pro = sbitmap_alloc (SBITMAP_SIZE (components)); - sbitmap epi = sbitmap_alloc (SBITMAP_SIZE (components)); - sbitmap tmp = sbitmap_alloc (SBITMAP_SIZE (components)); + sbitmap pro = sbitmap_alloc (SBITMAP_SIZE (m_components)); + sbitmap epi = sbitmap_alloc (SBITMAP_SIZE (m_components)); + sbitmap tmp = sbitmap_alloc (SBITMAP_SIZE (m_components)); basic_block bb; FOR_EACH_BB_FN (bb, cfun) @@ -1381,8 +1543,8 @@ predecessor edges to this block. */ /* First, select all possible components. */ - bitmap_copy (epi, components); - bitmap_copy (pro, components); + bitmap_copy (epi, m_components); + bitmap_copy (pro, m_components); edge e; edge_iterator ei; @@ -1421,24 +1583,14 @@ /* Place code after the BB note. */ if (!bitmap_empty_p (pro)) { - start_sequence (); - targetm.shrink_wrap.emit_prologue_components (pro); - rtx_insn *seq = get_insns (); - end_sequence (); - - emit_insn_after (seq, bb_note (bb)); + emit_prologue_into_bb (bb, pro, true); bitmap_ior (SW (bb)->head_components, SW (bb)->head_components, pro); } if (!bitmap_empty_p (epi)) { - start_sequence (); - targetm.shrink_wrap.emit_epilogue_components (epi); - rtx_insn *seq = get_insns (); - end_sequence (); - - emit_insn_after (seq, bb_note (bb)); + emit_epilogue_into_bb (bb, epi, true); bitmap_ior (SW (bb)->head_components, SW (bb)->head_components, epi); } @@ -1451,12 +1603,12 @@ /* Place code for prologues and epilogues for COMPONENTS where we can put that code at the end of basic blocks. */ -static void -emit_common_tails_for_components (sbitmap components) +void +sw_sep_base::emit_common_tails_for_components () { - sbitmap pro = sbitmap_alloc (SBITMAP_SIZE (components)); - sbitmap epi = sbitmap_alloc (SBITMAP_SIZE (components)); - sbitmap tmp = sbitmap_alloc (SBITMAP_SIZE (components)); + sbitmap pro = sbitmap_alloc (SBITMAP_SIZE (m_components)); + sbitmap epi = sbitmap_alloc (SBITMAP_SIZE (m_components)); + sbitmap tmp = sbitmap_alloc (SBITMAP_SIZE (m_components)); basic_block bb; FOR_EACH_BB_FN (bb, cfun) @@ -1467,8 +1619,8 @@ continue; /* First, select all possible components. */ - bitmap_copy (epi, components); - bitmap_copy (pro, components); + bitmap_copy (epi, m_components); + bitmap_copy (pro, m_components); edge e; edge_iterator ei; @@ -1510,32 +1662,13 @@ /* Put the code at the end of the BB, but before any final jump. */ if (!bitmap_empty_p (epi)) { - start_sequence (); - targetm.shrink_wrap.emit_epilogue_components (epi); - rtx_insn *seq = get_insns (); - end_sequence (); - - rtx_insn *insn = BB_END (bb); - if (control_flow_insn_p (insn)) - emit_insn_before (seq, insn); - else - emit_insn_after (seq, insn); - + emit_epilogue_into_bb (bb, epi, false); bitmap_ior (SW (bb)->tail_components, SW (bb)->tail_components, epi); } if (!bitmap_empty_p (pro)) { - start_sequence (); - targetm.shrink_wrap.emit_prologue_components (pro); - rtx_insn *seq = get_insns (); - end_sequence (); - - rtx_insn *insn = BB_END (bb); - if (JUMP_P (insn) || (CALL_P (insn) && SIBLING_CALL_P (insn))) - emit_insn_before (seq, insn); - else - emit_insn_after (seq, insn); + emit_prologue_into_bb (bb, pro, false); bitmap_ior (SW (bb)->tail_components, SW (bb)->tail_components, pro); } @@ -1548,11 +1681,11 @@ /* Place prologues and epilogues for COMPONENTS on edges, if we haven't already placed them inside blocks directly. */ -static void -insert_prologue_epilogue_for_components (sbitmap components) +void +sw_sep_base::insert_prologue_epilogue_for_components () { - sbitmap pro = sbitmap_alloc (SBITMAP_SIZE (components)); - sbitmap epi = sbitmap_alloc (SBITMAP_SIZE (components)); + sbitmap pro = sbitmap_alloc (SBITMAP_SIZE (m_components)); + sbitmap epi = sbitmap_alloc (SBITMAP_SIZE (m_components)); basic_block bb; FOR_EACH_BB_FN (bb, cfun) @@ -1569,8 +1702,8 @@ SW (e->dest)->has_components); bitmap_and_compl (pro, SW (e->dest)->has_components, SW (e->src)->has_components); - bitmap_and (epi, epi, components); - bitmap_and (pro, pro, components); + bitmap_and (epi, epi, m_components); + bitmap_and (pro, pro, m_components); /* Deselect those we already have put at the head or tail of the edge's dest resp. src. */ @@ -1590,36 +1723,8 @@ fprintf (dump_file, "\n"); } - /* Put the epilogue components in place. */ - start_sequence (); - targetm.shrink_wrap.emit_epilogue_components (epi); - rtx_insn *seq = get_insns (); - end_sequence (); - - if (e->flags & EDGE_SIBCALL) - { - gcc_assert (e->dest == EXIT_BLOCK_PTR_FOR_FN (cfun)); - - rtx_insn *insn = BB_END (e->src); - gcc_assert (CALL_P (insn) && SIBLING_CALL_P (insn)); - emit_insn_before (seq, insn); - } - else if (e->dest == EXIT_BLOCK_PTR_FOR_FN (cfun)) - { - gcc_assert (e->flags & EDGE_FALLTHRU); - basic_block new_bb = split_edge (e); - emit_insn_after (seq, BB_END (new_bb)); - } - else - insert_insn_on_edge (seq, e); - - /* Put the prologue components in place. */ - start_sequence (); - targetm.shrink_wrap.emit_prologue_components (pro); - seq = get_insns (); - end_sequence (); - - insert_insn_on_edge (seq, e); + emit_epilogue_on_edge (e, epi); + emit_prologue_on_edge (e, pro); } } } @@ -1630,6 +1735,62 @@ commit_edge_insertions (); } +sbitmap +sw_sep_base::run () +{ + init (); + + sbitmap_iterator sbi; + unsigned int j; + EXECUTE_IF_SET_IN_BITMAP (m_components, 0, j, sbi) + place_prologue_for_one_component (j, m_first_bb); + + spread_components (m_first_bb); + + disqualify_problematic_components (); + + /* Don't separately shrink-wrap anything where the "main" prologue will + go; the target code can often optimize things if it is presented with + all components together (say, if it generates store-multiple insns). */ + bitmap_and_compl (m_components, m_components, + SW (m_first_bb)->has_components); + + if (bitmap_empty_p (m_components)) + { + if (dump_file) + fprintf (dump_file, "Not wrapping anything separately.\n"); + return m_components; + } + if (dump_file) + { + fprintf (dump_file, "The components we wrap separately are"); + dump_components ("sep", m_components); + fprintf (dump_file, "\n"); + + fprintf (dump_file, "... Inserting common heads...\n"); + } + + emit_common_heads_for_components (); + + if (dump_file) + fprintf (dump_file, "... Inserting common tails...\n"); + + emit_common_tails_for_components (); + + if (dump_file) + fprintf (dump_file, "... Inserting the more difficult ones...\n"); + + insert_prologue_epilogue_for_components (); + + if (dump_file) + fprintf (dump_file, "... Done.\n"); + + set_handled (); + + fini (); + return m_components; +} + /* The main entry point to this subpass. FIRST_BB is where the prologue would be normally put. */ void @@ -1663,61 +1824,310 @@ calculate_dominance_info (CDI_DOMINATORS); calculate_dominance_info (CDI_POST_DOMINATORS); - init_separate_shrink_wrap (components); + sw_sep engine (first_bb, components); + components = engine.run (); + engine.fini (); - sbitmap_iterator sbi; - unsigned int j; - EXECUTE_IF_SET_IN_BITMAP (components, 0, j, sbi) - place_prologue_for_one_component (j, first_bb); + sbitmap_free (components); + free_dominance_info (CDI_DOMINATORS); + free_dominance_info (CDI_POST_DOMINATORS); +} - spread_components (first_bb); +#if CHECKING_P - disqualify_problematic_components (components); +const int n_components_for_test = 32; +const int n_cfgs_to_try = 32; +const int n_paths_to_try = 256; +const int max_path_len = 32; - /* Don't separately shrink-wrap anything where the "main" prologue will - go; the target code can often optimize things if it is presented with - all components together (say, if it generates store-multiple insns). */ - bitmap_and_compl (components, components, SW (first_bb)->has_components); +namespace selftest { - if (bitmap_empty_p (components)) +/* A derived class that performs a self-test of the separate + shrink-wrapping algorithm. */ +class sw_test_sep : public sw_sep_base +{ + hash_map m_orig_components; + hash_map m_pro_added_to_head; + hash_map m_epi_added_to_head; + hash_map m_pro_added_to_end; + hash_map m_epi_added_to_end; + hash_map m_pro_added_to_edge; + hash_map m_epi_added_to_edge; + + sbitmap components_for_bb (basic_block); + void disqualify (edge, sbitmap, bool); + void emit_prologue_into_bb (basic_block, sbitmap, bool); + void emit_epilogue_into_bb (basic_block, sbitmap, bool); + void emit_prologue_on_edge (edge, sbitmap); + void emit_epilogue_on_edge (edge, sbitmap); + void set_handled (); + + public: + void fini (); + + sw_test_sep (basic_block first_bb, sbitmap components) + : sw_sep_base (first_bb, components) + { + } +}; + +sbitmap +sw_test_sep::components_for_bb (basic_block bb) +{ + sbitmap components = sbitmap_alloc (n_components_for_test); + bitmap_clear (components); + if (bb != ENTRY_BLOCK_PTR_FOR_FN (cfun) + && bb != EXIT_BLOCK_PTR_FOR_FN (cfun)) + for (int i = 0; i < n_components_for_test; i++) + if (random () & 1) + bitmap_set_bit (components, i); + m_orig_components.put (bb, components); + return components; +} + +void +sw_test_sep::disqualify (edge, sbitmap, bool) +{ + /* This mechanism isn't tested yet. */ +} + +/* Call the target hook to emit epilogue components for EPI, and place + the insns on edge E. */ +void +sw_test_sep::emit_epilogue_on_edge (edge e, sbitmap epi) +{ + sbitmap *p = m_epi_added_to_edge.get (e); + sbitmap mask; + if (p == NULL) { - if (dump_file) - fprintf (dump_file, "Not wrapping anything separately.\n"); + mask = sbitmap_alloc (n_components_for_test); + bitmap_clear (mask); + m_epi_added_to_edge.put (e, mask); } else + mask = *p; + + ASSERT_FALSE (bitmap_intersect_p (mask, epi)); + bitmap_ior (mask, mask, epi); +} + +/* Call the target hook to emit prologue components for EPI, and place + the insns on edge E. */ +void +sw_test_sep::emit_prologue_on_edge (edge e, sbitmap pro) +{ + sbitmap *p = m_pro_added_to_edge.get (e); + sbitmap mask; + if (p == NULL) { - if (dump_file) - { - fprintf (dump_file, "The components we wrap separately are"); - dump_components ("sep", components); - fprintf (dump_file, "\n"); + mask = sbitmap_alloc (n_components_for_test); + bitmap_clear (mask); + m_pro_added_to_edge.put (e, mask); + } + else + mask = *p; - fprintf (dump_file, "... Inserting common heads...\n"); - } + ASSERT_FALSE (bitmap_intersect_p (mask, pro)); + bitmap_ior (mask, mask, pro); +} - emit_common_heads_for_components (components); +/* Call the target hook to emit epilogue components for EPI, and place + the insns in basic block BB, at the start if AT_HEAD, at the end + otherwise. */ +void +sw_test_sep::emit_epilogue_into_bb (basic_block bb, sbitmap epi, bool at_head) +{ + hash_map *map; + map = at_head ? &m_epi_added_to_head : &m_epi_added_to_end; + sbitmap *p = map->get (bb); + sbitmap mask; + if (p == NULL) + { + mask = sbitmap_alloc (n_components_for_test); + bitmap_clear (mask); + map->put (bb, mask); + } + else + mask = *p; - if (dump_file) - fprintf (dump_file, "... Inserting common tails...\n"); + ASSERT_FALSE (bitmap_intersect_p (mask, epi)); + bitmap_ior (mask, mask, epi); +} - emit_common_tails_for_components (components); +/* Call the target hook to emit prologue components for PRO, and place + the insns in basic block BB, at the start if AT_HEAD, at the end + otherwise. */ +void +sw_test_sep::emit_prologue_into_bb (basic_block bb, sbitmap pro, bool at_head) +{ + hash_map *map; + map = at_head ? &m_pro_added_to_head : &m_pro_added_to_end; + sbitmap *p = map->get (bb); + sbitmap mask; + if (p == NULL) + { + mask = sbitmap_alloc (n_components_for_test); + bitmap_clear (mask); + map->put (bb, mask); + } + else + mask = *p; - if (dump_file) - fprintf (dump_file, "... Inserting the more difficult ones...\n"); + ASSERT_FALSE (bitmap_intersect_p (mask, pro)); + bitmap_ior (mask, mask, pro); +} - insert_prologue_epilogue_for_components (components); +/* Called by the engine if separate shrink-wrapping was successful for + some components. For the test, we use this to verify the result. */ +void +sw_test_sep::set_handled () +{ + basic_block bb; + /* Verify that we didn't added prologue and epilogue for the same + component in the same place. */ + FOR_EACH_BB_FN (bb, cfun) + { + sbitmap *pro = m_pro_added_to_head.get (bb); + sbitmap *epi = m_epi_added_to_head.get (bb); + if (pro && epi) + ASSERT_FALSE (bitmap_intersect_p (*pro, *epi)); + pro = m_pro_added_to_end.get (bb); + epi = m_epi_added_to_end.get (bb); + if (pro && epi) + ASSERT_FALSE (bitmap_intersect_p (*pro, *epi)); + } - if (dump_file) - fprintf (dump_file, "... Done.\n"); + /* Try random paths through the function, and verify that + (a) if a block needs a component, it has been saved + (b) no component is saved or restored more than once in + a row. */ - targetm.shrink_wrap.set_handled_components (components); + sbitmap active = sbitmap_alloc (n_components_for_test); + for (int i = 0; i < n_paths_to_try; i++) + { + basic_block curr = ENTRY_BLOCK_PTR_FOR_FN (cfun); + bitmap_clear (active); - crtl->shrink_wrapped_separate = true; + for (int len = 0; len < max_path_len; len++) + { + /* Update for the bb head. */ + sbitmap *pro_entry = m_pro_added_to_head.get (curr); + sbitmap *epi_entry = m_epi_added_to_head.get (curr); + if (epi_entry) + { + ASSERT_FALSE (bitmap_intersect_compl_p (*epi_entry, active)); + bitmap_and_compl (active, active, *epi_entry); + } + if (pro_entry) + { + ASSERT_FALSE (bitmap_intersect_p (*pro_entry, active)); + bitmap_ior (active, active, *pro_entry); + } + + /* Make sure the required components are active. */ + sbitmap *required = m_orig_components.get (curr); + if (required) + ASSERT_FALSE (bitmap_intersect_compl_p (*required, active)); + + /* Update for the bb end. */ + sbitmap *pro_end = m_pro_added_to_end.get (curr); + sbitmap *epi_end = m_epi_added_to_end.get (curr); + if (epi_end) + { + ASSERT_FALSE (bitmap_intersect_compl_p (*epi_end, active)); + bitmap_and_compl (active, active, *epi_end); + } + if (pro_end) + { + ASSERT_FALSE (bitmap_intersect_p (*pro_end, active)); + bitmap_ior (active, active, *pro_end); + } + + int l = EDGE_COUNT (curr->succs); + if (l == 0) + break; + int n = random () % l; + edge e = EDGE_SUCC (curr, n); + + /* Update for the edge. */ + sbitmap *pro_edge = m_pro_added_to_edge.get (e); + sbitmap *epi_edge = m_epi_added_to_edge.get (e); + if (epi_edge) + { + ASSERT_FALSE (bitmap_intersect_compl_p (*epi_edge, active)); + bitmap_and_compl (active, active, *epi_edge); + } + if (pro_edge) + { + ASSERT_FALSE (bitmap_intersect_p (*pro_edge, active)); + bitmap_ior (active, active, *pro_edge); + } + + curr = e->dest; + if (curr == EXIT_BLOCK_PTR_FOR_FN (cfun)) + break; + } } +} - fini_separate_shrink_wrap (); +void +sw_test_sep::fini () +{ + basic_block bb; + FOR_EACH_BB_FN (bb, cfun) + { + sbitmap *proh = m_pro_added_to_head.get (bb); + sbitmap *epih = m_epi_added_to_head.get (bb); + sbitmap *proe = m_pro_added_to_end.get (bb); + sbitmap *epie = m_epi_added_to_end.get (bb); + if (proh) + sbitmap_free (*proh); + if (epih) + sbitmap_free (*epih); + if (proe) + sbitmap_free (*proe); + if (epie) + sbitmap_free (*epie); + edge e; + edge_iterator ei; + FOR_EACH_EDGE (e, ei, bb->preds) + { + sbitmap *pro = m_pro_added_to_edge.get (e); + sbitmap *epi = m_epi_added_to_edge.get (e); + if (pro) + sbitmap_free (*pro); + if (epi) + sbitmap_free (*epi); + } + } + + sw_sep_base::fini (); +} + +void +shrink_wrapping_separate_tests () +{ + /* Ask the target what components there are. If it returns NULL, don't + do anything. */ + sbitmap components = sbitmap_alloc (n_components_for_test); + bitmap_ones (components); + + basic_block first = build_random_cfg (5); + + calculate_dominance_info (CDI_DOMINATORS); + calculate_dominance_info (CDI_POST_DOMINATORS); + + sw_test_sep engine (first, components); + components = engine.run (); + engine.fini (); sbitmap_free (components); free_dominance_info (CDI_DOMINATORS); free_dominance_info (CDI_POST_DOMINATORS); + + free_random_cfg (); } + +} // namespace selftest + +#endif diff '--width=118' -x .svn -dru sw-selftest-base/gcc/tree-cfg.c with-selftest/gcc/tree-cfg.c --- sw-selftest-base/gcc/tree-cfg.c 2016-08-25 13:49:37.443942684 +0200 +++ with-selftest/gcc/tree-cfg.c 2016-08-26 14:45:06.643454608 +0200 @@ -9234,6 +9234,110 @@ return fndecl; } +static basic_block +build_single_after (vec *in_exits, vec *exits) +{ + basic_block bb_a = create_empty_bb ((*in_exits)[0]); + exits->safe_push (bb_a); + basic_block bb; + unsigned i; + FOR_EACH_VEC_ELT (*in_exits, i, bb) + make_edge (bb, bb_a, i == 0 ? EDGE_FALLTHRU : 0); + return bb_a; +} + +static basic_block +build_if (function *fun, basic_block thn, vec *exits) +{ + basic_block entry = ENTRY_BLOCK_PTR_FOR_FN (fun); + basic_block bb_a = create_empty_bb (entry); + make_edge (bb_a, thn, EDGE_TRUE_VALUE); + exits->safe_push (bb_a); + return bb_a; +} + +static basic_block +build_ifelse (function *fun, basic_block thn, basic_block els) +{ + basic_block entry = ENTRY_BLOCK_PTR_FOR_FN (fun); + basic_block bb_a = create_empty_bb (entry); + make_edge (bb_a, thn, EDGE_TRUE_VALUE); + make_edge (bb_a, els, EDGE_FALSE_VALUE); + return bb_a; +} + +static basic_block +build_loop (basic_block inner_head, vec *inner_exits, + vec *exits) +{ + build_single_after (inner_exits, exits); + basic_block bb; + unsigned i; + FOR_EACH_VEC_ELT (*inner_exits, i, bb) + make_edge (bb, inner_head, 0); + + return inner_head; +} + +static basic_block +recursive_build_cfg (function *fun, int maxdepth, vec *exits) +{ + if (maxdepth == 0) + { + basic_block entry = ENTRY_BLOCK_PTR_FOR_FN (fun); + basic_block bb_a = create_empty_bb (entry); + exits->safe_push (bb_a); + return bb_a; + } + int subdepth1 = random () % maxdepth; + int subdepth2 = random () % maxdepth; + auto_vec inner_exits; + basic_block inner, inner2; + + switch (random () % 4) + { + case 0: + inner = recursive_build_cfg (fun, subdepth1, exits); + return build_if (fun, inner, exits); + case 1: + inner = recursive_build_cfg (fun, subdepth1, exits); + inner2 = recursive_build_cfg (fun, subdepth2, exits); + return build_ifelse (fun, inner, inner2); + case 2: + inner = recursive_build_cfg (fun, subdepth1, &inner_exits); + return build_loop (inner, &inner_exits, exits); + case 3: + inner = recursive_build_cfg (fun, subdepth1, &inner_exits); + build_single_after (&inner_exits, exits); + return inner; + default: + gcc_unreachable (); + } +} + +basic_block +build_random_cfg (int maxdepth) +{ + gimple_register_cfg_hooks (); + + tree fndecl = push_fndecl ("cfg_test_random"); + function *fun = DECL_STRUCT_FUNCTION (fndecl); + + auto_vec rgn_exits; + basic_block head = recursive_build_cfg (fun, maxdepth, &rgn_exits); + make_edge (ENTRY_BLOCK_PTR_FOR_FN (fun), head, EDGE_FALLTHRU); + basic_block bb; + unsigned i; + FOR_EACH_VEC_ELT (rgn_exits, i, bb) + make_edge (bb, EXIT_BLOCK_PTR_FOR_FN (fun), 0); + return head; +} + +void free_random_cfg () +{ + pop_cfun (); +} + /* These tests directly create CFGs. Compare with the static fns within tree-cfg.c: - build_gimple_cfg --------------C2DF0FBABC6D00206892E5F8--