From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 24119 invoked by alias); 8 Apr 2019 13:42:21 -0000 Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org Received: (qmail 24111 invoked by uid 89); 8 Apr 2019 13:42:21 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-5.7 required=5.0 tests=AWL,BAYES_00,SPF_PASS autolearn=ham version=3.3.1 spammy=H*i:sk:4a63684, H*f:sk:4a63684 X-HELO: mx1.suse.de Received: from mx2.suse.de (HELO mx1.suse.de) (195.135.220.15) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 08 Apr 2019 13:42:18 +0000 Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 9427BAF67; Mon, 8 Apr 2019 13:42:16 +0000 (UTC) Date: Mon, 08 Apr 2019 13:42:00 -0000 From: Richard Biener To: nick cc: GCC Development Subject: Re: GSOC Proposal In-Reply-To: <4a636849-7aa8-e0a2-2ac9-63ad17e56ccc@gmail.com> Message-ID: References: <4327491a-395b-bee0-145a-eddd8f64b0ba@gmail.com> <63e78666-ceca-94d8-9ac4-101130afab4c@gmail.com> <70316c90-b241-3f88-56d8-9e59f3eac0ee@gmail.com> <038afc5c-85fa-8d78-5b84-289207c7b17f@gmail.com> <1eaa53df-3ed9-2a5f-d6db-2a224ee9da01@gmail.com> <4a636849-7aa8-e0a2-2ac9-63ad17e56ccc@gmail.com> User-Agent: Alpine 2.20 (LSU 67 2015-01-07) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-SW-Source: 2019-04/txt/msg00119.txt.bz2 On Mon, 8 Apr 2019, nick wrote: > > > On 2019-04-08 3:29 a.m., Richard Biener wrote: > > On Sun, 7 Apr 2019, nick wrote: > > > >> > >> > >> On 2019-04-07 5:31 a.m., Richard Biener wrote: > >>> On April 5, 2019 6:11:15 PM GMT+02:00, nick wrote: > >>>> > >>>> > >>>> On 2019-04-05 6:25 a.m., Richard Biener wrote: > >>>>> On Wed, 3 Apr 2019, nick wrote: > >>>>> > >>>>>> > >>>>>> > >>>>>> On 2019-04-03 7:30 a.m., Richard Biener wrote: > >>>>>>> On Mon, 1 Apr 2019, nick wrote: > >>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> On 2019-04-01 9:47 a.m., Richard Biener wrote: > >>>>>>>>> On Mon, 1 Apr 2019, nick wrote: > >>>>>>>>> > >>>>>>>>>> Well I'm talking about the shared roots of this garbage > >>>> collector core state > >>>>>>>>>> data structure or just struct ggc_root_tab. > >>>>>>>>>> > >>>>>>>>>> But also this seems that this to be no longer shared globally if > >>>> I'm not mistaken > >>>>>>>>>> or this: > >>>>>>>>>> static vec extra_root_vec; > >>>>>>>>>> > >>>>>>>>>> Not sure after reading the code which is a bigger deal through > >>>> so I wrote > >>>>>>>>>> my proposal not just asking which is a better issue for not > >>>> being thread > >>>>>>>>>> safe. Sorry about that. > >>>>>>>>>> > >>>>>>>>>> As for the second question injection seems to not be the issue > >>>> or outside > >>>>>>>>>> callers but just internal so phase 3 or step 3 would now be: > >>>>>>>>>> Find internal callers or users of x where x is one of the above > >>>> rather > >>>>>>>>>> than injecting outside callers. Which answers my second question > >>>> about > >>>>>>>>>> external callers being a issue still. > >>>>>>>>>> > >>>>>>>>>> Let me know which of the two is a better issue: > >>>>>>>>>> 1. struct ggc_root_tabs being shared > >>>>>>>>>> 2.static vec extra_root_vec; as a shared > >>>> heap or > >>>>>>>>>> vector of root nodes for each type of allocation > >>>>>>>>>> > >>>>>>>>>> and I will gladly rewrite my proposal sections for that > >>>>>>>>>> as needs to be reedited. > >>>>>>>>> > >>>>>>>>> I don't think working on the garbage collector as a separate > >>>>>>>>> GSoC project is useful at this point. Doing locking around > >>>>>>>>> allocation seems like a good short-term solution and if that > >>>>>>>>> turns out to be a performance issue for the threaded part > >>>>>>>>> using per-thread freelists is likely an easy to deploy > >>>>>>>>> solution. > >>>>>>>>> > >>>>>>>>> Richard. > >>>>>>>>> > >>>>>>>> I agree but we were discussing this: > >>>>>>>> Or maybe a project to be more > >>>>>>>> explicit about regions of the code that assume that the garbage- > >>>>>>>> collector can't run within them?[3] (since the GC is state that > >>>> would > >>>>>>>> be shared by the threads). > >>>>>>> > >>>>>>> The process of collecting garbage is not the only issue (and that > >>>>>>> very issue is easiest mitigated by collecting only at specific > >>>>>>> points - which is what we do - and have those be serializing > >>>> points). > >>>>>>> The main issue is the underlying memory allocator (GCC uses memory > >>>>>>> that is garbage collected plus regular heap memory). > >>>>>>> > >>>>>>>> In addition I moved my paper back to our discussion about garbage > >>>> collector > >>>>>>>> state with outside callers.Seems we really need to do something > >>>> about > >>>>>>>> my wording as the idea of my project in a nutshell was to figure > >>>>>>>> out how to mark shared state by callers and inject it into the > >>>>>>>> garbage collector letting it known that the state was not shared > >>>> between > >>>>>>>> threads or shared. Seems that was on the GSoc page and in our > >>>> discussions the issue > >>>>>>>> is marking outside code for shared state. If that's correct then > >>>> my > >>>>>>>> wording of outside callers is incorrect it should have been shared > >>>>>>>> state between threads on outside callers to the garbage collector. > >>>>>>>> If the state is that in your wording above then great as I > >>>> understand > >>>>>>>> where we are going and will gladly change my wording. > >>>>>>> > >>>>>>> I'm still not sure what you are shooting at, the above sentences do > >>>>>>> not make any sense to me. > >>>>>>> > >>>>>>>> Also freelists don't work here as the state is shared at the > >>>> caller's > >>>>>>>> end which would need two major issues: > >>>>>>>> 1. Locking on nodes of the > >>>>>>>> freelists when two threads allocate at the same thing which can be > >>>> a > >>>>>>>> problem if the shared state is shared a lot > >>>>>>>> 2. Locking allocation with > >>>>>>>> large numbers of callers can starve threads > >>>>>>> > >>>>>>> First of all allocating memory from the GC pool is not the main > >>>>>>> work of GIMPLE passes so simply serializing at allocation time > >>>> might > >>>>>>> work out. Second free lists of course do work. What you'd do is > >>>>>>> have a fast path in allocation using a thread-local "free list" > >>>>>>> which you can allocate from without taking any lock. Maybe I > >>>> should > >>>>>>> explain "free list" since that term doesn't make too much sense in > >>>>>>> a garbage collector world. What I'd do is when a client thread > >>>>>>> asks for memory of size N allocate M objects of that size but put > >>>>>>> M - 1 on the client thread local "free list" to be allocated > >>>> lock-free > >>>>>>> from for the next M - 1 calls. Note that garbage collected memory > >>>>>>> objects are only handed out in fixed chunks (powers of two plus > >>>>>>> a few special sizes) so you'd have one "free list" per chunk size > >>>>>>> per thread. > >>>>>>> > >>>>>>> The collection itself (mark & sweep) would be fully serialized > >>>> still > >>>>>>> (and not return to any threads local "free list"). > >>>>>>> > >>>>>>> ggc_free'd objects _might_ go to the threads "free list"s (yeah, we > >>>>>>> _do_ have ggc_free ...). > >>>>>>> > >>>>>>> As said, I don't see GC or the memory allocator as sth interesting > >>>>>>> to work on for parallelization until the basic setup works and it > >>>>>>> proves to be a bottleneck. > >>>>>>> > >>>>>>>> Seems that working on the garbage collector itself isn't the issue > >>>> but > >>>>>>>> the callers as I just figured out as related to your state idea. > >>>> Let me > >>>>>>>> know if that's correct and if the wording change I mentioned is > >>>> fine > >>>>>>>> with you as that's the state it seems that needs to be changed. > >>>>>>>> Nick > >>>>>>> > >>>>>>> Richard. > >>>>>>> > >>>>>> > >>>>>> That's fine and it's my fault for not understanding you better. I > >>>> was aware > >>>>>> of the expand_functions_all being taken for passes.c. However it > >>>> seems > >>>>>> two other issues are these sets as related to threads: > >>>>>> 1.finalize_compilation_unit > >>>>>> 2.and the ipa set of pass functions > >>>>>> > >>>>>> If I'm understanding it correctly number 1 seems to be a early > >>>> version of > >>>>>> expand_all_functions for the GENERIC representation if that's the > >>>> case > >>>>>> it really should be fixed. Not sure which is a better issue as both > >>>>>> seem to have issues either at the GENERIC level or GIMPLE level with > >>>> shared > >>>>>> state. > >>>>>> > >>>>>> Let me know if this is better as it seems now that I really think > >>>> about > >>>>>> it GIMPLE or GENERIC functions in passes.c are the main issue. > >>>>>> > >>>>>> Sorry for the misunderstanding and hopefully one of functions listed > >>>> is better > >>>>>> for moving forward with my proposal, > >>>>> > >>>>> Sorry, but guessing at useful projects by skimming through GCC code > >>>>> at this point isn't the way to go forward - this new "idea" lacks > >>>>> both detail and understanding. Please try to stick to one of the > >>>>> suggested projects or do more thorough research in case you want > >>>>> to work on a new project idea next year. > >>>>> > >>>>> Thanks, > >>>>> Richard. > >>>>> > >>>> > >>>> I was talking about cgraphunits.c and it seems that according to this: > >>>> Parallelize compilation using threads. GCC currently has an awful lot > >>>> of truly global state and even more per-pass global state which makes > >>>> this somewhat hard. The idea is to avoid issues with global state as > >>>> much as possible by partitioning the compilation pipeline in pieces > >>>> that share as little global state as possible and ensure each thread > >>>> only works in one of those partitions. The biggest roadblock will be > >>>> the not thread-safe memory allocator of GCC garbage collector. The goal > >>>> of this project is to have a compilation pipeline driven by a scheduler > >>>> assigning functions to be optimized to the partitions in the pipeline. > >>>> This project would be mentored by Richard Biener. Required skills > >>>> include: C/C++, ability to analyze big complex code base, > >>>> parallelization > >>>> > >>>> We are trying to create a rendering pipeline if I'm correct and it > >>>> seems that the GENERIC level needs finalize_compilation_unit > >>>> to be fixed like expand_all_functions at the GIMPLE. That's my point it > >>>> still is within that project. Here is what I wrote > >>>> as I figured out that it was shared state related to GENERIC passing to > >>>> GIMPLE which is a bottleneck or would be in the > >>>> threaded pipeline. > >>>> > >>>> https://docs.google.com/document/d/1BKVeh62IpigsQYf_fJqkdu_js0EeGdKtXInkWZ-DtU0/edit > >>> > >>> The pre- post-IPA parts cannot be easily parallelized and that includes GENERIC to GIMPLE translation. This is why the project should focus on the much easier post-IPA and pre-RTL parts of the compilation pipeline since there interaction between functions is minimal. > >>> > >>> Richard. > >>> > >>>> > >>>> Nick > >>> > >> Richard, > >> Thanks seems we finally got somewhere I rewrote it for that. Please just doubt check tomorrow before I submit later tomorrow > >> as I would just want to make sure it's OK with you. Here is the link: > >> https://docs.google.com/document/d/1BKVeh62IpigsQYf_fJqkdu_js0EeGdKtXInkWZ-DtU0/edit > >> > >> The only thing is if you want more detail in the project part as I just think it's > >> enough but let me know if not. > > > > It seems the project part lacks any specifics and those are to be > > researched and discovered in the Mid may - Late May timeframe of > > your Roadmap? > Yes this is because after looking into ipa functions it would take > some time to really understand it not just at a surface level. It's > not the functions themselves but thinking about how the pipeline > would work in the POST IPA and pre RTL functions. > > > I do not understand what "Create optimized functions ... for avoiding > > sharing of state ..." is about - what kind of functions do you intend > > to write? Often examples help. > > > Fix functions at both the post IPA and pre RTL levels for avoiding sharing of state and increasing > throughput with multiple threads in the compiler is what I actually meant. There aren't any new > functions it's optimizing the paths. So it was a small typo. Thanks for noticing it. > > I will submit either later today or early tomorrow so if there's anything else let me know, I think you will have to explain how you want to avoid sharing of state and at least vaguely _what_ global state you want to avoid. The other proposal side-steps global state of individual GIMPLE passes by never executing the same GIMPLE pass twice at the same time for example, so it's the scheduler avoiding shared state. What other global state do you have in mind? Richard.