From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 103468 invoked by alias); 7 Apr 2019 15:40:34 -0000 Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org Received: (qmail 103299 invoked by uid 89); 7 Apr 2019 15:40:34 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-5.2 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.1 spammy=patience, H*i:sk:DC08D4B, H*M:0f51, H*f:sk:DC08D4B X-HELO: mail-qk1-f177.google.com Received: from mail-qk1-f177.google.com (HELO mail-qk1-f177.google.com) (209.85.222.177) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Sun, 07 Apr 2019 15:40:31 +0000 Received: by mail-qk1-f177.google.com with SMTP id g1so6564764qki.5 for ; Sun, 07 Apr 2019 08:40:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=7VDzUEJMYL4v7UI2IHnJqjFSyb/KGldqX328PS5vqSc=; b=P3b0QRjOfbhfTLcjqVJ4cYQZp0xUZgb+n92HiQxhcgwy36DQXvX9a8tL4R8Q/ZB751 DpTK3worLyqf64fatLpA/B2Fat818YjJ/2VqlHbmQrYQqUEtbGs2VMqz3bHsuj0J76Wx HUUaV44gLOJ4kDzot4Nfo+yYpthyaSH4keN839UH1hWYajHaaqr+27j/1CTKoRbleMeI MAH6/rnU1mHtSKoiKTcbAK0mpFCL3mZS2ZjRdpeOmYibuA4QXDguOWsjikKoFNUKKSTm +7wfyail+OWfJxx+LuKK9YbHoV3GcGAtxjhRu9E6h2PB2E6Wnhjk1L6o/iMrFfWWbiz8 O+Iw== Return-Path: Received: from [192.168.1.101] (173-230-163-230.cable.teksavvy.com. [173.230.163.230]) by smtp.googlemail.com with ESMTPSA id s12sm14439623qtj.1.2019.04.07.08.40.28 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 07 Apr 2019 08:40:28 -0700 (PDT) Subject: Re: GSOC Proposal To: Richard Biener Cc: GCC Development References: <4327491a-395b-bee0-145a-eddd8f64b0ba@gmail.com> <63e78666-ceca-94d8-9ac4-101130afab4c@gmail.com> <70316c90-b241-3f88-56d8-9e59f3eac0ee@gmail.com> <038afc5c-85fa-8d78-5b84-289207c7b17f@gmail.com> <1eaa53df-3ed9-2a5f-d6db-2a224ee9da01@gmail.com> From: nick Message-ID: Date: Sun, 07 Apr 2019 15:40:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-IsSubscribed: yes X-SW-Source: 2019-04/txt/msg00100.txt.bz2 On 2019-04-07 5:31 a.m., Richard Biener wrote: > On April 5, 2019 6:11:15 PM GMT+02:00, nick wrote: >> >> >> On 2019-04-05 6:25 a.m., Richard Biener wrote: >>> On Wed, 3 Apr 2019, nick wrote: >>> >>>> >>>> >>>> On 2019-04-03 7:30 a.m., Richard Biener wrote: >>>>> On Mon, 1 Apr 2019, nick wrote: >>>>> >>>>>> >>>>>> >>>>>> On 2019-04-01 9:47 a.m., Richard Biener wrote: >>>>>>> On Mon, 1 Apr 2019, nick wrote: >>>>>>> >>>>>>>> Well I'm talking about the shared roots of this garbage >> collector core state >>>>>>>> data structure or just struct ggc_root_tab. >>>>>>>> >>>>>>>> But also this seems that this to be no longer shared globally if >> I'm not mistaken >>>>>>>> or this: >>>>>>>> static vec extra_root_vec; >>>>>>>> >>>>>>>> Not sure after reading the code which is a bigger deal through >> so I wrote >>>>>>>> my proposal not just asking which is a better issue for not >> being thread >>>>>>>> safe. Sorry about that. >>>>>>>> >>>>>>>> As for the second question injection seems to not be the issue >> or outside >>>>>>>> callers but just internal so phase 3 or step 3 would now be: >>>>>>>> Find internal callers or users of x where x is one of the above >> rather >>>>>>>> than injecting outside callers. Which answers my second question >> about >>>>>>>> external callers being a issue still. >>>>>>>> >>>>>>>> Let me know which of the two is a better issue: >>>>>>>> 1. struct ggc_root_tabs being shared >>>>>>>> 2.static vec extra_root_vec; as a shared >> heap or >>>>>>>> vector of root nodes for each type of allocation >>>>>>>> >>>>>>>> and I will gladly rewrite my proposal sections for that >>>>>>>> as needs to be reedited. >>>>>>> >>>>>>> I don't think working on the garbage collector as a separate >>>>>>> GSoC project is useful at this point. Doing locking around >>>>>>> allocation seems like a good short-term solution and if that >>>>>>> turns out to be a performance issue for the threaded part >>>>>>> using per-thread freelists is likely an easy to deploy >>>>>>> solution. >>>>>>> >>>>>>> Richard. >>>>>>> >>>>>> I agree but we were discussing this: >>>>>> Or maybe a project to be more >>>>>> explicit about regions of the code that assume that the garbage- >>>>>> collector can't run within them?[3] (since the GC is state that >> would >>>>>> be shared by the threads). >>>>> >>>>> The process of collecting garbage is not the only issue (and that >>>>> very issue is easiest mitigated by collecting only at specific >>>>> points - which is what we do - and have those be serializing >> points). >>>>> The main issue is the underlying memory allocator (GCC uses memory >>>>> that is garbage collected plus regular heap memory). >>>>> >>>>>> In addition I moved my paper back to our discussion about garbage >> collector >>>>>> state with outside callers.Seems we really need to do something >> about >>>>>> my wording as the idea of my project in a nutshell was to figure >>>>>> out how to mark shared state by callers and inject it into the >>>>>> garbage collector letting it known that the state was not shared >> between >>>>>> threads or shared. Seems that was on the GSoc page and in our >> discussions the issue >>>>>> is marking outside code for shared state. If that's correct then >> my >>>>>> wording of outside callers is incorrect it should have been shared >>>>>> state between threads on outside callers to the garbage collector. >>>>>> If the state is that in your wording above then great as I >> understand >>>>>> where we are going and will gladly change my wording. >>>>> >>>>> I'm still not sure what you are shooting at, the above sentences do >>>>> not make any sense to me. >>>>> >>>>>> Also freelists don't work here as the state is shared at the >> caller's >>>>>> end which would need two major issues: >>>>>> 1. Locking on nodes of the >>>>>> freelists when two threads allocate at the same thing which can be >> a >>>>>> problem if the shared state is shared a lot >>>>>> 2. Locking allocation with >>>>>> large numbers of callers can starve threads >>>>> >>>>> First of all allocating memory from the GC pool is not the main >>>>> work of GIMPLE passes so simply serializing at allocation time >> might >>>>> work out. Second free lists of course do work. What you'd do is >>>>> have a fast path in allocation using a thread-local "free list" >>>>> which you can allocate from without taking any lock. Maybe I >> should >>>>> explain "free list" since that term doesn't make too much sense in >>>>> a garbage collector world. What I'd do is when a client thread >>>>> asks for memory of size N allocate M objects of that size but put >>>>> M - 1 on the client thread local "free list" to be allocated >> lock-free >>>>> from for the next M - 1 calls. Note that garbage collected memory >>>>> objects are only handed out in fixed chunks (powers of two plus >>>>> a few special sizes) so you'd have one "free list" per chunk size >>>>> per thread. >>>>> >>>>> The collection itself (mark & sweep) would be fully serialized >> still >>>>> (and not return to any threads local "free list"). >>>>> >>>>> ggc_free'd objects _might_ go to the threads "free list"s (yeah, we >>>>> _do_ have ggc_free ...). >>>>> >>>>> As said, I don't see GC or the memory allocator as sth interesting >>>>> to work on for parallelization until the basic setup works and it >>>>> proves to be a bottleneck. >>>>> >>>>>> Seems that working on the garbage collector itself isn't the issue >> but >>>>>> the callers as I just figured out as related to your state idea. >> Let me >>>>>> know if that's correct and if the wording change I mentioned is >> fine >>>>>> with you as that's the state it seems that needs to be changed. >>>>>> Nick >>>>> >>>>> Richard. >>>>> >>>> >>>> That's fine and it's my fault for not understanding you better. I >> was aware >>>> of the expand_functions_all being taken for passes.c. However it >> seems >>>> two other issues are these sets as related to threads: >>>> 1.finalize_compilation_unit >>>> 2.and the ipa set of pass functions >>>> >>>> If I'm understanding it correctly number 1 seems to be a early >> version of >>>> expand_all_functions for the GENERIC representation if that's the >> case >>>> it really should be fixed. Not sure which is a better issue as both >>>> seem to have issues either at the GENERIC level or GIMPLE level with >> shared >>>> state. >>>> >>>> Let me know if this is better as it seems now that I really think >> about >>>> it GIMPLE or GENERIC functions in passes.c are the main issue. >>>> >>>> Sorry for the misunderstanding and hopefully one of functions listed >> is better >>>> for moving forward with my proposal, >>> >>> Sorry, but guessing at useful projects by skimming through GCC code >>> at this point isn't the way to go forward - this new "idea" lacks >>> both detail and understanding. Please try to stick to one of the >>> suggested projects or do more thorough research in case you want >>> to work on a new project idea next year. >>> >>> Thanks, >>> Richard. >>> >> >> I was talking about cgraphunits.c and it seems that according to this: >> Parallelize compilation using threads. GCC currently has an awful lot >> of truly global state and even more per-pass global state which makes >> this somewhat hard. The idea is to avoid issues with global state as >> much as possible by partitioning the compilation pipeline in pieces >> that share as little global state as possible and ensure each thread >> only works in one of those partitions. The biggest roadblock will be >> the not thread-safe memory allocator of GCC garbage collector. The goal >> of this project is to have a compilation pipeline driven by a scheduler >> assigning functions to be optimized to the partitions in the pipeline. >> This project would be mentored by Richard Biener. Required skills >> include: C/C++, ability to analyze big complex code base, >> parallelization >> >> We are trying to create a rendering pipeline if I'm correct and it >> seems that the GENERIC level needs finalize_compilation_unit >> to be fixed like expand_all_functions at the GIMPLE. That's my point it >> still is within that project. Here is what I wrote >> as I figured out that it was shared state related to GENERIC passing to >> GIMPLE which is a bottleneck or would be in the >> threaded pipeline. >> >> https://docs.google.com/document/d/1BKVeh62IpigsQYf_fJqkdu_js0EeGdKtXInkWZ-DtU0/edit > > The pre- post-IPA parts cannot be easily parallelized and that includes GENERIC to GIMPLE translation. This is why the project should focus on the much easier post-IPA and pre-RTL parts of the compilation pipeline since there interaction between functions is minimal. > > Richard. > >> >> Nick > Richard, Thanks seems we finally got somewhere I rewrote it for that. Please just doubt check tomorrow before I submit later tomorrow as I would just want to make sure it's OK with you. Here is the link: https://docs.google.com/document/d/1BKVeh62IpigsQYf_fJqkdu_js0EeGdKtXInkWZ-DtU0/edit The only thing is if you want more detail in the project part as I just think it's enough but let me know if not. Thanks for your patience, Nick