From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-return-198638-listarch-gcc=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 76324 invoked by alias); 29 Mar 2019 08:48:01 -0000
Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc/>
List-Post: <mailto:gcc@gcc.gnu.org>
List-Help: <http://gcc.gnu.org/ml/>
Sender: gcc-owner@gcc.gnu.org
Received: (qmail 75154 invoked by uid 89); 29 Mar 2019 08:48:00 -0000
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: No, score=-18.5 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,KAM_SHORT,SPF_PASS autolearn=ham version=3.3.1 spammy=Mary
X-HELO: mx1.suse.de
Received: from mx2.suse.de (HELO mx1.suse.de) (195.135.220.15) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 29 Mar 2019 08:47:58 +0000
Received: from relay2.suse.de (unknown [195.135.220.254])	by mx1.suse.de (Postfix) with ESMTP id DCF6CAF9C;	Fri, 29 Mar 2019 08:47:55 +0000 (UTC)
Date: Fri, 29 Mar 2019 08:48:00 -0000
From: Richard Biener <rguenther@suse.de>
To: Giuliano Belinassi <giuliano.belinassi@usp.br>
cc: Richard Biener <richard.guenther@gmail.com>,     David Malcolm <dmalcolm@redhat.com>, nick <xerofoify@gmail.com>,     GCC Development <gcc@gcc.gnu.org>, Martin Jambor <mjambor@suse.cz>
Subject: Re: GSOC
In-Reply-To: <20190328202025.ackul3dtgrwvscur@smtp.gmail.com>
Message-ID: <alpine.LSU.2.20.1903290932250.27537@zhemvz.fhfr.qr>
References: <176a02b4-ed71-4a42-fb76-09570f303991@gmail.com> <1553607171.18132.95.camel@redhat.com> <alpine.LSU.2.20.1903261435160.4934@zhemvz.fhfr.qr> <20190327135515.qsu7kka5mukw375e@smtp.gmail.com> <CAFiYyc34ueVDncWnXiyxM+PZ1yS9isGzaz6hKe=9bP6L_c_x=Q@mail.gmail.com> <20190328202025.ackul3dtgrwvscur@smtp.gmail.com>
User-Agent: Alpine 2.20 (LSU 67 2015-01-07)
MIME-Version: 1.0
Content-Type: multipart/mixed; BOUNDARY="-1609908220-310172016-1553849275=:27537"
X-SW-Source: 2019-03/txt/msg00252.txt.bz2

  This message is in MIME format.  The first part should be readable text,
  while the remaining parts are likely unreadable without MIME-aware tools.

---1609908220-310172016-1553849275=:27537
Content-Type: text/plain; charset=ISO-8859-15
Content-Transfer-Encoding: 8BIT
Content-length: 9670

On Thu, 28 Mar 2019, Giuliano Belinassi wrote:

> Hi, Richard
> 
> On 03/28, Richard Biener wrote:
> > On Wed, Mar 27, 2019 at 2:55 PM Giuliano Belinassi
> > <giuliano.belinassi@usp.br> wrote:
> > >
> > > Hi,
> > >
> > > On 03/26, Richard Biener wrote:
> > > > On Tue, 26 Mar 2019, David Malcolm wrote:
> > > >
> > > > > On Mon, 2019-03-25 at 19:51 -0400, nick wrote:
> > > > > > Greetings All,
> > > > > >
> > > > > > I would like to take up parallelize compilation using threads or make
> > > > > > c++/c
> > > > > > memory issues not automatically promote. I did ask about this before
> > > > > > but
> > > > > > not get a reply. When someone replies I'm just a little concerned as
> > > > > > my writing for proposals has never been great so if someone just
> > > > > > reviews
> > > > > > and doubt checks that's fine.
> > > > > >
> > > > > > As for the other things building gcc and running the testsuite is
> > > > > > fine. Plus
> > > > > > I already working on gcc so I've pretty aware of most things and this
> > > > > > would
> > > > > > be a great steeping stone into more serious gcc development work.
> > > > > >
> > > > > > If sample code is required that's in mainline gcc I sent out a trial
> > > > > > patch
> > > > > > for this issue: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88395
> > > > > >
> > > > > > Cheers,
> > > > > >
> > > > > > Nick
> > > > >
> > > > > It's good to see that you've gotten as far as attaching a patch to BZ
> > > > > [1]
> > > > >
> > > > > I think someone was going to attempt the "parallelize compilation using
> > > > > threads" idea last year, but then pulled out before the summer; you may
> > > > > want to check the archives (or was that you?)
> > > >
> > > > There's also Giuliano Belinassi who is interested in the same project
> > > > (CCed).
> > >
> > > Yes, I will apply for this project, and I will submit the final version
> > > of my proposal by the end of the week.
> > >
> > > Currently, my target is the `expand_all_functions` routine, as most of
> > > the time is spent on it according to the experiments that I performed as
> > > part of my Master's research on compiler parallelization.
> > > (-O2, --disable-checking)
> > 
> > Yes, more specifically I think the realistic target is the GIMPLE part
> > of   execute_pass_list (cfun, g->get_passes ()->all_passes);  done in
> > cgraph_node::expand.  If you look at passes.def you'll see all_passes
> > also contains RTL expansion (pass_expand) and the RTL optimization
> > queue (pass_rest_of_compilation).  The RTL part isn't a realistic target.
> > Without changing the pass hierarchy the obvious part that can be
> > handled would be the pass_all_optimizations pass sub-queue of
> > all_passes since those are all passes that perform transforms on the
> > GIMPLE IL where we have all functions in this state at the same time
> > and where no interactions between the functions happen anymore
> > and thus functions can be processed in parallel (as much as make
> > processes individual translation units in parallel).
> > 
> 
> Great. So if I understood correctly, I will need to split
> cgraph_node::expand() into three parts: IPA, GIMPLE and RTL, and then
> refactor `expand_all_functions` so that the loop
> 
>      for (i = new_order_pos - 1; i >= 0; i--)
> 
>  use these three functions, then partition
> 
>      g->get_passes()->all_passes
> 
> into get_passes()->gimple_passes and get_passes()->rtl_passes, so I
> can run RTL after GIMPLE is finished, to finally start the
> paralellization of per function GIMPLE passes.

Yes, it involves refactoring of the loop - you may notice that
parts of the compilation pipeline are under control of the
pass manager (passes.c) but some is still manually driven
by symbol_table::compile.  Whether it's more convenient to
get more control stuffed to the pass manager and perform the
threading under its control (I'd say that would be the cleaner
design) or to try do this in the current ad-hoc parts remains
to be seen.  You can see symbol_table::compile hands over
control to the pass manager multiple times, first ipa_passes ()
then all_late_ipa_passes and finally the expand_all_functions code.

I guess it would simplify things if you'd split pass_all_passes
in passes.def at pass_expand like so:

diff --git a/gcc/passes.def b/gcc/passes.def
index 2fcd80e53a3..bb0453b36a7 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -403,11 +403,10 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_spectrev1);
   NEXT_PASS (pass_warn_function_noreturn);
   NEXT_PASS (pass_gen_hsail);
+  TERMINATE_PASS_LIST (all_passes)
 
-  NEXT_PASS (pass_expand);
-
-  NEXT_PASS (pass_rest_of_compilation);
-  PUSH_INSERT_PASSES_WITHIN (pass_rest_of_compilation)
+  INSERT_PASSES_AFTER (pass_rest_of_compilation)
+      NEXT_PASS (pass_expand);
       NEXT_PASS (pass_instantiate_virtual_regs);
       NEXT_PASS (pass_into_cfg_layout_mode);
       NEXT_PASS (pass_jump);
@@ -505,6 +504,5 @@ along with GCC; see the file COPYING3.  If not see
          NEXT_PASS (pass_final);
       POP_INSERT_PASSES ()
       NEXT_PASS (pass_df_finish);
-  POP_INSERT_PASSES ()
   NEXT_PASS (pass_clean_state);
-  TERMINATE_PASS_LIST (all_passes)
+  TERMINATE_PASS_LIST (pass_rest_of_compilation)

where to make things "work" again w/o threading you'd invoke
execute_pass_list (cfun, g->get_passes ()->pass_rest_of_compilation)
right after the all_passes invocation in cgraph_node::expand.

You then can refactor things so the loop over the 'order' array
is done twice, once over all_passes (the set you then parallelize)
and once over pass_rest_of_compilation (which you can't parallelize
because of being in RTL).

The above patch needs more changes in pass manager code - a chance
to dive into it a little since that's where you'd change code.

> > To simplify the taks further a useful constraint is to not have
> > a single optimization pass executed multiple times at the same time
> > (otherwise you have to look at pass specific global states as well),
> > thus the parallel part could be coded in a way keeping per function
> > the state of what pass to execute next and have a scheduler pick
> > a function its next pass is "free", scheduling that to a fixed set of
> > worker threads.  There's no dependences between functions
> > for the scheduling but each pass has only one execution resource
> > in the pipeline.  You can start processing an arbitrarily large number
> > of functions but slow functions will keep others from advancing across
> > the pass it executes on.
> >
> 
> Something like a pipeline? That is certainly a start, but if one pass is
> very slow wouldn't it bottleneck everything?

Yes, something like a pipeline.  It's true a slow pass would
bottleneck things - as said, we can selectively make passes
thread safe in such cases.

> > Passes could of course be individually marked as thread-safe
> > (multiple instances execute concurrently).
> > 
> > Garbage collection is already in control of the pass manager which
> > would also be the thread scheduler.  For GC the remaining issue
> > is allocation which passes occasionally do.  Locking is the short
> > term solution for GSoC I guess, long-term per-thread GC pools
> > might be better (to not slow down non-threaded parts of the compiler).
> > 
> > Richard.
> > 
> > >
> > > Thank you,
> > > Giuliano.
> > >
> > >
> > > >
> > > > > IIRC Richard [CCed] was going to mentor, with me co-mentoring [2] - but
> > > > > I don't know if he's still interested/able to spare the cycles.
> > > >
> > > > I've offered mentoring to Giuliano, so yes.
> > > >
> > > > > That said, the parallel compilation one strikes me as very ambitious;
> > > > > it's not clear to me what could realistically be done as a GSoC
> > > > > project.  I think a good proposal on that would come up with some
> > > > > subset of the problem that's doable over a summer, whilst also being
> > > > > useful to the project.  The RTL infrastructure has a lot of global
> > > > > state, so maybe either focus on the gimple passes, or on fixing global
> > > > > state on the RTL side?  (I'm not sure)
> > > >
> > > > That was the original intent for the experiment.  There's also
> > > > the already somewhat parallel WPA stage in LTO compilation mode
> > > > (but it simply forks for the sake of simplicity...).
> > > >
> > > > > Or maybe a project to be more
> > > > > explicit about regions of the code that assume that the garbage-
> > > > > collector can't run within them?[3] (since the GC is state that would
> > > > > be shared by the threads).
> > > >
> > > > The GC will be one obstackle.  The original idea was to drive
> > > > parallelization on the pass level by the pass manager for the
> > > > GIMPLE passes, so serialization points would be in it.
> > > >
> > > > Richard.
> > > >
> > > > > Hope this is constructive/helpful
> > > > > Dave
> > > > >
> > > > > [1] though typically our workflow involved sending patches to the gcc-
> > > > > patches mailing list
> > > > > [2] as libgccjit maintainer I have an interest in global state within
> > > > > the compiler
> > > > > [3] I posted some ideas about this back in 2013 IIRC; probably
> > > > > massively bit-rotted since then.  I also gave a talk at Cauldron 2013
> > > > > about global state in the compiler (with a view to gcc-as-a-shared-
> > > > > library); likewise I expect much of the ideas there to be out-of-date);
> > > > > for libgccjit I went with a different approach
> 
> Thank you,
> Giuliano.
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Linux GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany;
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah; HRB 21284 (AG Nürnberg)
---1609908220-310172016-1553849275=:27537--