From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 125876 invoked by alias); 8 Apr 2019 07:30:04 -0000 Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org Received: (qmail 125575 invoked by uid 89); 8 Apr 2019 07:30:04 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-5.7 required=5.0 tests=AWL,BAYES_00,SPF_PASS autolearn=ham version=3.3.1 spammy=Late, mid, rendering, goal X-HELO: mx1.suse.de Received: from mx2.suse.de (HELO mx1.suse.de) (195.135.220.15) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 08 Apr 2019 07:30:01 +0000 Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 78018B028; Mon, 8 Apr 2019 07:29:59 +0000 (UTC) Date: Mon, 08 Apr 2019 07:30:00 -0000 From: Richard Biener To: nick cc: GCC Development Subject: Re: GSOC Proposal In-Reply-To: Message-ID: References: <4327491a-395b-bee0-145a-eddd8f64b0ba@gmail.com> <63e78666-ceca-94d8-9ac4-101130afab4c@gmail.com> <70316c90-b241-3f88-56d8-9e59f3eac0ee@gmail.com> <038afc5c-85fa-8d78-5b84-289207c7b17f@gmail.com> <1eaa53df-3ed9-2a5f-d6db-2a224ee9da01@gmail.com> User-Agent: Alpine 2.20 (LSU 67 2015-01-07) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-SW-Source: 2019-04/txt/msg00105.txt.bz2 On Sun, 7 Apr 2019, nick wrote: > > > On 2019-04-07 5:31 a.m., Richard Biener wrote: > > On April 5, 2019 6:11:15 PM GMT+02:00, nick wrote: > >> > >> > >> On 2019-04-05 6:25 a.m., Richard Biener wrote: > >>> On Wed, 3 Apr 2019, nick wrote: > >>> > >>>> > >>>> > >>>> On 2019-04-03 7:30 a.m., Richard Biener wrote: > >>>>> On Mon, 1 Apr 2019, nick wrote: > >>>>> > >>>>>> > >>>>>> > >>>>>> On 2019-04-01 9:47 a.m., Richard Biener wrote: > >>>>>>> On Mon, 1 Apr 2019, nick wrote: > >>>>>>> > >>>>>>>> Well I'm talking about the shared roots of this garbage > >> collector core state > >>>>>>>> data structure or just struct ggc_root_tab. > >>>>>>>> > >>>>>>>> But also this seems that this to be no longer shared globally if > >> I'm not mistaken > >>>>>>>> or this: > >>>>>>>> static vec extra_root_vec; > >>>>>>>> > >>>>>>>> Not sure after reading the code which is a bigger deal through > >> so I wrote > >>>>>>>> my proposal not just asking which is a better issue for not > >> being thread > >>>>>>>> safe. Sorry about that. > >>>>>>>> > >>>>>>>> As for the second question injection seems to not be the issue > >> or outside > >>>>>>>> callers but just internal so phase 3 or step 3 would now be: > >>>>>>>> Find internal callers or users of x where x is one of the above > >> rather > >>>>>>>> than injecting outside callers. Which answers my second question > >> about > >>>>>>>> external callers being a issue still. > >>>>>>>> > >>>>>>>> Let me know which of the two is a better issue: > >>>>>>>> 1. struct ggc_root_tabs being shared > >>>>>>>> 2.static vec extra_root_vec; as a shared > >> heap or > >>>>>>>> vector of root nodes for each type of allocation > >>>>>>>> > >>>>>>>> and I will gladly rewrite my proposal sections for that > >>>>>>>> as needs to be reedited. > >>>>>>> > >>>>>>> I don't think working on the garbage collector as a separate > >>>>>>> GSoC project is useful at this point. Doing locking around > >>>>>>> allocation seems like a good short-term solution and if that > >>>>>>> turns out to be a performance issue for the threaded part > >>>>>>> using per-thread freelists is likely an easy to deploy > >>>>>>> solution. > >>>>>>> > >>>>>>> Richard. > >>>>>>> > >>>>>> I agree but we were discussing this: > >>>>>> Or maybe a project to be more > >>>>>> explicit about regions of the code that assume that the garbage- > >>>>>> collector can't run within them?[3] (since the GC is state that > >> would > >>>>>> be shared by the threads). > >>>>> > >>>>> The process of collecting garbage is not the only issue (and that > >>>>> very issue is easiest mitigated by collecting only at specific > >>>>> points - which is what we do - and have those be serializing > >> points). > >>>>> The main issue is the underlying memory allocator (GCC uses memory > >>>>> that is garbage collected plus regular heap memory). > >>>>> > >>>>>> In addition I moved my paper back to our discussion about garbage > >> collector > >>>>>> state with outside callers.Seems we really need to do something > >> about > >>>>>> my wording as the idea of my project in a nutshell was to figure > >>>>>> out how to mark shared state by callers and inject it into the > >>>>>> garbage collector letting it known that the state was not shared > >> between > >>>>>> threads or shared. Seems that was on the GSoc page and in our > >> discussions the issue > >>>>>> is marking outside code for shared state. If that's correct then > >> my > >>>>>> wording of outside callers is incorrect it should have been shared > >>>>>> state between threads on outside callers to the garbage collector. > >>>>>> If the state is that in your wording above then great as I > >> understand > >>>>>> where we are going and will gladly change my wording. > >>>>> > >>>>> I'm still not sure what you are shooting at, the above sentences do > >>>>> not make any sense to me. > >>>>> > >>>>>> Also freelists don't work here as the state is shared at the > >> caller's > >>>>>> end which would need two major issues: > >>>>>> 1. Locking on nodes of the > >>>>>> freelists when two threads allocate at the same thing which can be > >> a > >>>>>> problem if the shared state is shared a lot > >>>>>> 2. Locking allocation with > >>>>>> large numbers of callers can starve threads > >>>>> > >>>>> First of all allocating memory from the GC pool is not the main > >>>>> work of GIMPLE passes so simply serializing at allocation time > >> might > >>>>> work out. Second free lists of course do work. What you'd do is > >>>>> have a fast path in allocation using a thread-local "free list" > >>>>> which you can allocate from without taking any lock. Maybe I > >> should > >>>>> explain "free list" since that term doesn't make too much sense in > >>>>> a garbage collector world. What I'd do is when a client thread > >>>>> asks for memory of size N allocate M objects of that size but put > >>>>> M - 1 on the client thread local "free list" to be allocated > >> lock-free > >>>>> from for the next M - 1 calls. Note that garbage collected memory > >>>>> objects are only handed out in fixed chunks (powers of two plus > >>>>> a few special sizes) so you'd have one "free list" per chunk size > >>>>> per thread. > >>>>> > >>>>> The collection itself (mark & sweep) would be fully serialized > >> still > >>>>> (and not return to any threads local "free list"). > >>>>> > >>>>> ggc_free'd objects _might_ go to the threads "free list"s (yeah, we > >>>>> _do_ have ggc_free ...). > >>>>> > >>>>> As said, I don't see GC or the memory allocator as sth interesting > >>>>> to work on for parallelization until the basic setup works and it > >>>>> proves to be a bottleneck. > >>>>> > >>>>>> Seems that working on the garbage collector itself isn't the issue > >> but > >>>>>> the callers as I just figured out as related to your state idea. > >> Let me > >>>>>> know if that's correct and if the wording change I mentioned is > >> fine > >>>>>> with you as that's the state it seems that needs to be changed. > >>>>>> Nick > >>>>> > >>>>> Richard. > >>>>> > >>>> > >>>> That's fine and it's my fault for not understanding you better. I > >> was aware > >>>> of the expand_functions_all being taken for passes.c. However it > >> seems > >>>> two other issues are these sets as related to threads: > >>>> 1.finalize_compilation_unit > >>>> 2.and the ipa set of pass functions > >>>> > >>>> If I'm understanding it correctly number 1 seems to be a early > >> version of > >>>> expand_all_functions for the GENERIC representation if that's the > >> case > >>>> it really should be fixed. Not sure which is a better issue as both > >>>> seem to have issues either at the GENERIC level or GIMPLE level with > >> shared > >>>> state. > >>>> > >>>> Let me know if this is better as it seems now that I really think > >> about > >>>> it GIMPLE or GENERIC functions in passes.c are the main issue. > >>>> > >>>> Sorry for the misunderstanding and hopefully one of functions listed > >> is better > >>>> for moving forward with my proposal, > >>> > >>> Sorry, but guessing at useful projects by skimming through GCC code > >>> at this point isn't the way to go forward - this new "idea" lacks > >>> both detail and understanding. Please try to stick to one of the > >>> suggested projects or do more thorough research in case you want > >>> to work on a new project idea next year. > >>> > >>> Thanks, > >>> Richard. > >>> > >> > >> I was talking about cgraphunits.c and it seems that according to this: > >> Parallelize compilation using threads. GCC currently has an awful lot > >> of truly global state and even more per-pass global state which makes > >> this somewhat hard. The idea is to avoid issues with global state as > >> much as possible by partitioning the compilation pipeline in pieces > >> that share as little global state as possible and ensure each thread > >> only works in one of those partitions. The biggest roadblock will be > >> the not thread-safe memory allocator of GCC garbage collector. The goal > >> of this project is to have a compilation pipeline driven by a scheduler > >> assigning functions to be optimized to the partitions in the pipeline. > >> This project would be mentored by Richard Biener. Required skills > >> include: C/C++, ability to analyze big complex code base, > >> parallelization > >> > >> We are trying to create a rendering pipeline if I'm correct and it > >> seems that the GENERIC level needs finalize_compilation_unit > >> to be fixed like expand_all_functions at the GIMPLE. That's my point it > >> still is within that project. Here is what I wrote > >> as I figured out that it was shared state related to GENERIC passing to > >> GIMPLE which is a bottleneck or would be in the > >> threaded pipeline. > >> > >> https://docs.google.com/document/d/1BKVeh62IpigsQYf_fJqkdu_js0EeGdKtXInkWZ-DtU0/edit > > > > The pre- post-IPA parts cannot be easily parallelized and that includes GENERIC to GIMPLE translation. This is why the project should focus on the much easier post-IPA and pre-RTL parts of the compilation pipeline since there interaction between functions is minimal. > > > > Richard. > > > >> > >> Nick > > > Richard, > Thanks seems we finally got somewhere I rewrote it for that. Please just doubt check tomorrow before I submit later tomorrow > as I would just want to make sure it's OK with you. Here is the link: > https://docs.google.com/document/d/1BKVeh62IpigsQYf_fJqkdu_js0EeGdKtXInkWZ-DtU0/edit > > The only thing is if you want more detail in the project part as I just think it's > enough but let me know if not. It seems the project part lacks any specifics and those are to be researched and discovered in the Mid may - Late May timeframe of your Roadmap? I do not understand what "Create optimized functions ... for avoiding sharing of state ..." is about - what kind of functions do you intend to write? Often examples help. Richard. > Thanks for your patience, > Nick