From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 19835 invoked by alias); 12 May 2019 18:31:04 -0000 Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org Received: (qmail 19826 invoked by uid 89); 12 May 2019 18:31:04 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-24.2 required=5.0 tests=BAYES_50,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.1 spammy=H*M:smtp, hes, he's, summer X-HELO: mail-qt1-f179.google.com Received: from mail-qt1-f179.google.com (HELO mail-qt1-f179.google.com) (209.85.160.179) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Sun, 12 May 2019 18:31:00 +0000 Received: by mail-qt1-f179.google.com with SMTP id o7so12358095qtp.4 for ; Sun, 12 May 2019 11:31:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=usp-br.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to :user-agent; bh=R9ZVTuuCsA3ihiE7ZzNmjNdZnwxcRZR+x6pjOr6EcBg=; b=Zg3UmrdMbus42cSUM8wz9RPjT+rQT9JFemOsdbR+JodxSYqQouKMAjrYGIqmqNb9pK KaylupZa1zGhqX3pqsb97xTJhhXQrAi9Hp4crO0JLQNtiXen5IIfoFeiiQkegBd4i2CJ xDrbUHvMcs3G365ODIrL2VEfJBUQw3hGsal4YVeMfFHsfQckhTvy7kK6OJYijv9Rccik tauKLskxR4GBYxQgZFBfLfkwc7bi6nmjvviPcR3zsl2yFAHHg11AgS3t6umkL/SQTh7X 9RPCh9fhA8ahQ96UT11A5LhkGMOBpxO0x0PV4xi0NfjtPJWSkqlANfE93ImvPfW5t9GN qGFg== Return-Path: Received: from smtp.gmail.com ([2804:14d:90ae:84e7:9e30:5bff:fefc:2c5d]) by smtp.gmail.com with ESMTPSA id i13sm4097997qkm.68.2019.05.12.11.30.55 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Sun, 12 May 2019 11:30:56 -0700 (PDT) Date: Sun, 12 May 2019 18:31:00 -0000 From: Giuliano Belinassi To: Richard Biener Cc: Richard Biener , David Malcolm , GCC Development Subject: Re: GSOC Message-ID: <20190512183052.hccsbj7kqimiwkot@smtp.gmail.com> References: <176a02b4-ed71-4a42-fb76-09570f303991@gmail.com> <1553607171.18132.95.camel@redhat.com> <20190327135515.qsu7kka5mukw375e@smtp.gmail.com> <20190328202025.ackul3dtgrwvscur@smtp.gmail.com> <20190506184735.xyfhrgqg6gdy7gsz@smtp.gmail.com> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="vk6y5y5jptwzh7f4" Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: NeoMutt/20180716 X-IsSubscribed: yes X-SW-Source: 2019-05/txt/msg00084.txt.bz2 --vk6y5y5jptwzh7f4 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit Content-length: 14200 Hi, Richard On 05/07, Richard Biener wrote: > On Mon, 6 May 2019, Giuliano Belinassi wrote: > > > Hi, > > > > On 03/29, Richard Biener wrote: > > > On Thu, 28 Mar 2019, Giuliano Belinassi wrote: > > > > > > > Hi, Richard > > > > > > > > On 03/28, Richard Biener wrote: > > > > > On Wed, Mar 27, 2019 at 2:55 PM Giuliano Belinassi > > > > > wrote: > > > > > > > > > > > > Hi, > > > > > > > > > > > > On 03/26, Richard Biener wrote: > > > > > > > On Tue, 26 Mar 2019, David Malcolm wrote: > > > > > > > > > > > > > > > On Mon, 2019-03-25 at 19:51 -0400, nick wrote: > > > > > > > > > Greetings All, > > > > > > > > > > > > > > > > > > I would like to take up parallelize compilation using threads or make > > > > > > > > > c++/c > > > > > > > > > memory issues not automatically promote. I did ask about this before > > > > > > > > > but > > > > > > > > > not get a reply. When someone replies I'm just a little concerned as > > > > > > > > > my writing for proposals has never been great so if someone just > > > > > > > > > reviews > > > > > > > > > and doubt checks that's fine. > > > > > > > > > > > > > > > > > > As for the other things building gcc and running the testsuite is > > > > > > > > > fine. Plus > > > > > > > > > I already working on gcc so I've pretty aware of most things and this > > > > > > > > > would > > > > > > > > > be a great steeping stone into more serious gcc development work. > > > > > > > > > > > > > > > > > > If sample code is required that's in mainline gcc I sent out a trial > > > > > > > > > patch > > > > > > > > > for this issue: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88395 > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > > > > > > > > > > Nick > > > > > > > > > > > > > > > > It's good to see that you've gotten as far as attaching a patch to BZ > > > > > > > > [1] > > > > > > > > > > > > > > > > I think someone was going to attempt the "parallelize compilation using > > > > > > > > threads" idea last year, but then pulled out before the summer; you may > > > > > > > > want to check the archives (or was that you?) > > > > > > > > > > > > > > There's also Giuliano Belinassi who is interested in the same project > > > > > > > (CCed). > > > > > > > > > > > > Yes, I will apply for this project, and I will submit the final version > > > > > > of my proposal by the end of the week. > > > > > > > > > > > > Currently, my target is the `expand_all_functions` routine, as most of > > > > > > the time is spent on it according to the experiments that I performed as > > > > > > part of my Master's research on compiler parallelization. > > > > > > (-O2, --disable-checking) > > > > > > > > > > Yes, more specifically I think the realistic target is the GIMPLE part > > > > > of execute_pass_list (cfun, g->get_passes ()->all_passes); done in > > > > > cgraph_node::expand. If you look at passes.def you'll see all_passes > > > > > also contains RTL expansion (pass_expand) and the RTL optimization > > > > > queue (pass_rest_of_compilation). The RTL part isn't a realistic target. > > > > > Without changing the pass hierarchy the obvious part that can be > > > > > handled would be the pass_all_optimizations pass sub-queue of > > > > > all_passes since those are all passes that perform transforms on the > > > > > GIMPLE IL where we have all functions in this state at the same time > > > > > and where no interactions between the functions happen anymore > > > > > and thus functions can be processed in parallel (as much as make > > > > > processes individual translation units in parallel). > > > > > > > > > > > > > Great. So if I understood correctly, I will need to split > > > > cgraph_node::expand() into three parts: IPA, GIMPLE and RTL, and then > > > > refactor `expand_all_functions` so that the loop > > > > > > > > for (i = new_order_pos - 1; i >= 0; i--) > > > > > > > > use these three functions, then partition > > > > > > > > g->get_passes()->all_passes > > > > > > > > into get_passes()->gimple_passes and get_passes()->rtl_passes, so I > > > > can run RTL after GIMPLE is finished, to finally start the > > > > paralellization of per function GIMPLE passes. > > > > > > Yes, it involves refactoring of the loop - you may notice that > > > parts of the compilation pipeline are under control of the > > > pass manager (passes.c) but some is still manually driven > > > by symbol_table::compile. Whether it's more convenient to > > > get more control stuffed to the pass manager and perform the > > > threading under its control (I'd say that would be the cleaner > > > design) or to try do this in the current ad-hoc parts remains > > > to be seen. You can see symbol_table::compile hands over > > > control to the pass manager multiple times, first ipa_passes () > > > then all_late_ipa_passes and finally the expand_all_functions code. > > > > > > I guess it would simplify things if you'd split pass_all_passes > > > in passes.def at pass_expand like so: > > > > > > diff --git a/gcc/passes.def b/gcc/passes.def > > > index 2fcd80e53a3..bb0453b36a7 100644 > > > --- a/gcc/passes.def > > > +++ b/gcc/passes.def > > > @@ -403,11 +403,10 @@ along with GCC; see the file COPYING3. If not see > > > NEXT_PASS (pass_spectrev1); > > > NEXT_PASS (pass_warn_function_noreturn); > > > NEXT_PASS (pass_gen_hsail); > > > + TERMINATE_PASS_LIST (all_passes) > > > > > > - NEXT_PASS (pass_expand); > > > - > > > - NEXT_PASS (pass_rest_of_compilation); > > > - PUSH_INSERT_PASSES_WITHIN (pass_rest_of_compilation) > > > + INSERT_PASSES_AFTER (pass_rest_of_compilation) > > > + NEXT_PASS (pass_expand); > > > NEXT_PASS (pass_instantiate_virtual_regs); > > > NEXT_PASS (pass_into_cfg_layout_mode); > > > NEXT_PASS (pass_jump); > > > @@ -505,6 +504,5 @@ along with GCC; see the file COPYING3. If not see > > > NEXT_PASS (pass_final); > > > POP_INSERT_PASSES () > > > NEXT_PASS (pass_df_finish); > > > - POP_INSERT_PASSES () > > > NEXT_PASS (pass_clean_state); > > > - TERMINATE_PASS_LIST (all_passes) > > > + TERMINATE_PASS_LIST (pass_rest_of_compilation) > > > > > > where to make things "work" again w/o threading you'd invoke > > > execute_pass_list (cfun, g->get_passes ()->pass_rest_of_compilation) > > > right after the all_passes invocation in cgraph_node::expand. > > > > > > You then can refactor things so the loop over the 'order' array > > > is done twice, once over all_passes (the set you then parallelize) > > > and once over pass_rest_of_compilation (which you can't parallelize > > > because of being in RTL). > > > > > > > I managed to get it working today. However, I found an issue with the > > statistics_fini_pass() and pass_init_dump_file(), which I had to > > comment, and force a `return false` for every case, respectively. Then I > > managed to compile some programs correctly with -O2. I have no idea why > > yet, but I will keep searching. I've attached my patch here. > > It may be that you need to adjust the GCC_PASS_LISTS define in > pass_manager.h, changing pass_rest_of_compilation to a pass list > and also remove its "old" definition in passes.c. Or it might > be simpler to not re-use pass_rest_of_compilation but wrap > the tail in a new all_passes2 or so. You'll also see > a call to register_dump_files (all_passes) in passes.c where > you probably need to do the same for the new tail. > > As usual grep is your best friend when figuring out what to do > (doing that right now myself). > > Richard. I am just posting this as an update. Just to contextualize, I've managed to fix this issue by following Biener's advices. However what happened next was that the pr87600-2.c was crashing the compiler, as discussed in IRC. The reason of the crash was that after the discussed modifications, the `pass->next` after `pass_expand` was not `pass_cleanup`, since the NEXT_PASS after `pass_expand` was `pass_instantiate_virtual_regs`. I fixed that by not reusing the `pass_rest_of_compilation`, but by creating a new pass called `all_rtl_passes` and keeping `pass_expand` before the `PUSH_INSERT_PASSES_WITHIN`. After that, I've run the full `dg.exp` testsuite and found that the following tests are also failing: FAIL: gcc.dg/20020919-1.c (internal compiler error) FAIL: gcc.dg/always_inline3.c (internal compiler error) FAIL: gcc.dg/dx-test.c (internal compiler error) FAIL: gcc.dg/pr32370.c (internal compiler error) FAIL: gcc.dg/pr61692.c (internal compiler error) FAIL: gcc.dg/pr87600-2.c (internal compiler error) FAIL: gcc.dg/va-arg-pack-1.c (internal compiler error) FAIL: g++.dg/cpp0x/Wattributes1.C -std=c++14 (test for warnings, line 125) FAIL: g++.dg/cpp0x/Wattributes1.C -std=c++17 (test for warnings, line 125) FAIL: g++.dg/opt/pr69213.C -std=gnu++98 (internal compiler error) FAIL: g++.dg/opt/pr69213.C -std=gnu++14 (internal compiler error) FAIL: g++.dg/opt/pr69213.C -std=gnu++17 (internal compiler error) Notice that pr87600-2.c is failing again, but now when trying to compile the `long test3 (void)` function. Interestingly, if I comment every function but test3(), the test passes. All tests seems to be failed in the assert: gcc_assert (TREE_ASM_WRITTEN (decl)) of cgraphunit.c:2255, and the rtl dumps seems always to be different from trunk. I've attached a patch to my modifications, if anyone is interested in it. I hope to provide a better feedback in the next days. > > > > > > > > The above patch needs more changes in pass manager code - a chance > > > to dive into it a little since that's where you'd change code. > > > > > > > > To simplify the taks further a useful constraint is to not have > > > > > a single optimization pass executed multiple times at the same time > > > > > (otherwise you have to look at pass specific global states as well), > > > > > thus the parallel part could be coded in a way keeping per function > > > > > the state of what pass to execute next and have a scheduler pick > > > > > a function its next pass is "free", scheduling that to a fixed set of > > > > > worker threads. There's no dependences between functions > > > > > for the scheduling but each pass has only one execution resource > > > > > in the pipeline. You can start processing an arbitrarily large number > > > > > of functions but slow functions will keep others from advancing across > > > > > the pass it executes on. > > > > > > > > > > > > > Something like a pipeline? That is certainly a start, but if one pass is > > > > very slow wouldn't it bottleneck everything? > > > > > > Yes, something like a pipeline. It's true a slow pass would > > > bottleneck things - as said, we can selectively make passes > > > thread safe in such cases. > > > > > > > > Passes could of course be individually marked as thread-safe > > > > > (multiple instances execute concurrently). > > > > > > > > > > Garbage collection is already in control of the pass manager which > > > > > would also be the thread scheduler. For GC the remaining issue > > > > > is allocation which passes occasionally do. Locking is the short > > > > > term solution for GSoC I guess, long-term per-thread GC pools > > > > > might be better (to not slow down non-threaded parts of the compiler). > > > > > > > > > > Richard. > > > > > > > > > > > > > > > > > Thank you, > > > > > > Giuliano. > > > > > > > > > > > > > > > > > > > > > > > > > > > IIRC Richard [CCed] was going to mentor, with me co-mentoring [2] - but > > > > > > > > I don't know if he's still interested/able to spare the cycles. > > > > > > > > > > > > > > I've offered mentoring to Giuliano, so yes. > > > > > > > > > > > > > > > That said, the parallel compilation one strikes me as very ambitious; > > > > > > > > it's not clear to me what could realistically be done as a GSoC > > > > > > > > project. I think a good proposal on that would come up with some > > > > > > > > subset of the problem that's doable over a summer, whilst also being > > > > > > > > useful to the project. The RTL infrastructure has a lot of global > > > > > > > > state, so maybe either focus on the gimple passes, or on fixing global > > > > > > > > state on the RTL side? (I'm not sure) > > > > > > > > > > > > > > That was the original intent for the experiment. There's also > > > > > > > the already somewhat parallel WPA stage in LTO compilation mode > > > > > > > (but it simply forks for the sake of simplicity...). > > > > > > > > > > > > > > > Or maybe a project to be more > > > > > > > > explicit about regions of the code that assume that the garbage- > > > > > > > > collector can't run within them?[3] (since the GC is state that would > > > > > > > > be shared by the threads). > > > > > > > > > > > > > > The GC will be one obstackle. The original idea was to drive > > > > > > > parallelization on the pass level by the pass manager for the > > > > > > > GIMPLE passes, so serialization points would be in it. > > > > > > > > > > > > > > Richard. > > > > > > > > > > > > > > > Hope this is constructive/helpful > > > > > > > > Dave > > > > > > > > > > > > > > > > [1] though typically our workflow involved sending patches to the gcc- > > > > > > > > patches mailing list > > > > > > > > [2] as libgccjit maintainer I have an interest in global state within > > > > > > > > the compiler > > > > > > > > [3] I posted some ideas about this back in 2013 IIRC; probably > > > > > > > > massively bit-rotted since then. I also gave a talk at Cauldron 2013 > > > > > > > > about global state in the compiler (with a view to gcc-as-a-shared- > > > > > > > > library); likewise I expect much of the ideas there to be out-of-date); > > > > > > > > for libgccjit I went with a different approach > > > > > > > > Thank you, > > > > Giuliano. > > > > > > > > > > -- > > > Richard Biener > > > SUSE Linux GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; > > > GF: Felix Imendörffer, Mary Higgins, Sri Rasiah; HRB 21284 (AG Nürnberg) > > > > > > -- > Richard Biener > SUSE Linux GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; > GF: Felix Imendörffer, Mary Higgins, Sri Rasiah; HRB 21284 (AG Nürnberg) Giuliano. --vk6y5y5jptwzh7f4 Content-Type: text/x-diff; charset=us-ascii Content-Disposition: attachment; filename="0001-Split-rtl_passes-from-all_passes.patch" Content-length: 4108 >From d3fcfa095d209550c814e878b744f7d9b2394729 Mon Sep 17 00:00:00 2001 From: Giuliano Belinassi Date: Sun, 12 May 2019 13:07:57 -0300 Subject: Split rtl_passes from all_passes Signed-off-by: Giuliano Belinassi --- gcc/cgraphunit.c | 2 ++ gcc/pass_manager.h | 2 ++ gcc/passes.c | 10 +++++++--- gcc/passes.def | 5 +++-- 4 files changed, 14 insertions(+), 5 deletions(-) diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c index 8bfbd0bb12f..2c730dfba12 100644 --- a/gcc/cgraphunit.c +++ b/gcc/cgraphunit.c @@ -2196,6 +2196,8 @@ cgraph_node::expand (void) invoke_plugin_callbacks (PLUGIN_ALL_PASSES_START, NULL); execute_pass_list (cfun, g->get_passes ()->all_passes); + if (!(rtl_dump_and_exit || flag_syntax_only || seen_error ())) + execute_pass_list (cfun, g->get_passes ()->all_rtl_passes); /* Signal the end of passes. */ invoke_plugin_callbacks (PLUGIN_ALL_PASSES_END, NULL); diff --git a/gcc/pass_manager.h b/gcc/pass_manager.h index 8c7d272f20f..473fc6a9df5 100644 --- a/gcc/pass_manager.h +++ b/gcc/pass_manager.h @@ -30,6 +30,7 @@ struct register_pass_info; DEF_PASS_LIST (all_small_ipa_passes) \ DEF_PASS_LIST (all_regular_ipa_passes) \ DEF_PASS_LIST (all_late_ipa_passes) \ + DEF_PASS_LIST (all_rtl_passes) \ DEF_PASS_LIST (all_passes) #define DEF_PASS_LIST(LIST) PASS_LIST_NO_##LIST, @@ -92,6 +93,7 @@ public: opt_pass *all_lowering_passes; opt_pass *all_regular_ipa_passes; opt_pass *all_late_ipa_passes; + opt_pass *all_rtl_passes; /* A map from static pass id to optimization pass. */ opt_pass **passes_by_id; diff --git a/gcc/passes.c b/gcc/passes.c index a9c22889d71..b76065470a8 100644 --- a/gcc/passes.c +++ b/gcc/passes.c @@ -958,6 +958,7 @@ pass_manager::dump_passes () const dump_pass_list (all_regular_ipa_passes, 1); dump_pass_list (all_late_ipa_passes, 1); dump_pass_list (all_passes, 1); + dump_pass_list (all_rtl_passes, 1); pop_dummy_function (); } @@ -1468,6 +1469,8 @@ pass_manager::register_pass (struct register_pass_info *pass_info) success |= position_pass (pass_info, &all_late_ipa_passes); if (!success || all_instances) success |= position_pass (pass_info, &all_passes); + if (!success || all_instances) + success |= position_pass (pass_info, &all_rtl_passes); if (!success) fatal_error (input_location, @@ -1515,9 +1518,9 @@ pass_manager::register_pass (struct register_pass_info *pass_info) */ pass_manager::pass_manager (context *ctxt) -: all_passes (NULL), all_small_ipa_passes (NULL), all_lowering_passes (NULL), - all_regular_ipa_passes (NULL), - all_late_ipa_passes (NULL), passes_by_id (NULL), passes_by_id_size (0), +: all_passes (NULL), all_rtl_passes (NULL), all_small_ipa_passes (NULL), all_lowering_passes (NULL), + all_regular_ipa_passes (NULL), all_late_ipa_passes (NULL), + passes_by_id (NULL), passes_by_id_size (0), m_ctxt (ctxt), m_name_to_pass_map (NULL) { opt_pass **p; @@ -1595,6 +1598,7 @@ pass_manager::pass_manager (context *ctxt) register_dump_files (all_regular_ipa_passes); register_dump_files (all_late_ipa_passes); register_dump_files (all_passes); + register_dump_files (all_rtl_passes); } static void diff --git a/gcc/passes.def b/gcc/passes.def index ad2efabd385..246775c1bed 100644 --- a/gcc/passes.def +++ b/gcc/passes.def @@ -395,9 +395,10 @@ along with GCC; see the file COPYING3. If not see NEXT_PASS (pass_cleanup_cfg_post_optimizing); NEXT_PASS (pass_warn_function_noreturn); NEXT_PASS (pass_gen_hsail); + TERMINATE_PASS_LIST (all_passes) + INSERT_PASSES_AFTER (all_rtl_passes) NEXT_PASS (pass_expand); - NEXT_PASS (pass_rest_of_compilation); PUSH_INSERT_PASSES_WITHIN (pass_rest_of_compilation) NEXT_PASS (pass_instantiate_virtual_regs); @@ -499,4 +500,4 @@ along with GCC; see the file COPYING3. If not see NEXT_PASS (pass_df_finish); POP_INSERT_PASSES () NEXT_PASS (pass_clean_state); - TERMINATE_PASS_LIST (all_passes) + TERMINATE_PASS_LIST (all_rtl_passes) -- 2.20.1 --vk6y5y5jptwzh7f4--