From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 130040 invoked by alias); 8 Apr 2019 14:17:36 -0000 Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org Received: (qmail 130030 invoked by uid 89); 8 Apr 2019 14:17:35 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-5.0 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.1 spammy=HX-Received:a0c X-HELO: mail-qt1-f171.google.com Received: from mail-qt1-f171.google.com (HELO mail-qt1-f171.google.com) (209.85.160.171) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 08 Apr 2019 14:17:32 +0000 Received: by mail-qt1-f171.google.com with SMTP id z17so15498992qts.13 for ; Mon, 08 Apr 2019 07:17:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=jbJ0jNuvyvwvjXrLoALXpFHxf2FAIjW/0WZ8oIECpzU=; b=tm0Ihz7E8X0neTiBJAByRsW+ZTu3m2h2vDW7DVdFenoPhk2lE8bQ+zI6Tn0xY1a9sg Ja4vvu5wOIk3V3DffbqNvBAbdmdpm6L+oKk/37kUVu3pg4NgpKFh8+pW6mKC1B/rBX42 +Jr/vXk8OVtPZirSJ1XttK5EjoZdzqhd/GO848gjloVeXSXTzKpH7BoimFt5E4qEO+n7 vX8MZbNoJhmKs1Ws/Jhyb7l6DgD8PlELWfk08etTz7hWaLCVtZAo8GdRycIhPpAQEQiO mUGwkgzP9ucg8fmehy1bwF4NpjPV8YiuXhW1qPLvGMFVqwpqsl6JcGj1oQF0nsgxu8J5 eZ9w== Return-Path: Received: from [192.168.1.101] (173-230-163-230.cable.teksavvy.com. [173.230.163.230]) by smtp.googlemail.com with ESMTPSA id i24sm20867179qti.76.2019.04.08.07.17.28 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 08 Apr 2019 07:17:29 -0700 (PDT) Subject: Re: GSOC Proposal To: Richard Biener Cc: GCC Development References: <4327491a-395b-bee0-145a-eddd8f64b0ba@gmail.com> <63e78666-ceca-94d8-9ac4-101130afab4c@gmail.com> <70316c90-b241-3f88-56d8-9e59f3eac0ee@gmail.com> <038afc5c-85fa-8d78-5b84-289207c7b17f@gmail.com> <1eaa53df-3ed9-2a5f-d6db-2a224ee9da01@gmail.com> <4a636849-7aa8-e0a2-2ac9-63ad17e56ccc@gmail.com> From: nick Message-ID: <9d60ac29-ea75-5552-6132-c9f60cde629c@gmail.com> Date: Mon, 08 Apr 2019 14:17:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-IsSubscribed: yes X-SW-Source: 2019-04/txt/msg00121.txt.bz2 On 2019-04-08 9:42 a.m., Richard Biener wrote: > On Mon, 8 Apr 2019, nick wrote: > >> >> >> On 2019-04-08 3:29 a.m., Richard Biener wrote: >>> On Sun, 7 Apr 2019, nick wrote: >>> >>>> >>>> >>>> On 2019-04-07 5:31 a.m., Richard Biener wrote: >>>>> On April 5, 2019 6:11:15 PM GMT+02:00, nick wrote: >>>>>> >>>>>> >>>>>> On 2019-04-05 6:25 a.m., Richard Biener wrote: >>>>>>> On Wed, 3 Apr 2019, nick wrote: >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 2019-04-03 7:30 a.m., Richard Biener wrote: >>>>>>>>> On Mon, 1 Apr 2019, nick wrote: >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 2019-04-01 9:47 a.m., Richard Biener wrote: >>>>>>>>>>> On Mon, 1 Apr 2019, nick wrote: >>>>>>>>>>> >>>>>>>>>>>> Well I'm talking about the shared roots of this garbage >>>>>> collector core state >>>>>>>>>>>> data structure or just struct ggc_root_tab. >>>>>>>>>>>> >>>>>>>>>>>> But also this seems that this to be no longer shared globally if >>>>>> I'm not mistaken >>>>>>>>>>>> or this: >>>>>>>>>>>> static vec extra_root_vec; >>>>>>>>>>>> >>>>>>>>>>>> Not sure after reading the code which is a bigger deal through >>>>>> so I wrote >>>>>>>>>>>> my proposal not just asking which is a better issue for not >>>>>> being thread >>>>>>>>>>>> safe. Sorry about that. >>>>>>>>>>>> >>>>>>>>>>>> As for the second question injection seems to not be the issue >>>>>> or outside >>>>>>>>>>>> callers but just internal so phase 3 or step 3 would now be: >>>>>>>>>>>> Find internal callers or users of x where x is one of the above >>>>>> rather >>>>>>>>>>>> than injecting outside callers. Which answers my second question >>>>>> about >>>>>>>>>>>> external callers being a issue still. >>>>>>>>>>>> >>>>>>>>>>>> Let me know which of the two is a better issue: >>>>>>>>>>>> 1. struct ggc_root_tabs being shared >>>>>>>>>>>> 2.static vec extra_root_vec; as a shared >>>>>> heap or >>>>>>>>>>>> vector of root nodes for each type of allocation >>>>>>>>>>>> >>>>>>>>>>>> and I will gladly rewrite my proposal sections for that >>>>>>>>>>>> as needs to be reedited. >>>>>>>>>>> >>>>>>>>>>> I don't think working on the garbage collector as a separate >>>>>>>>>>> GSoC project is useful at this point. Doing locking around >>>>>>>>>>> allocation seems like a good short-term solution and if that >>>>>>>>>>> turns out to be a performance issue for the threaded part >>>>>>>>>>> using per-thread freelists is likely an easy to deploy >>>>>>>>>>> solution. >>>>>>>>>>> >>>>>>>>>>> Richard. >>>>>>>>>>> >>>>>>>>>> I agree but we were discussing this: >>>>>>>>>> Or maybe a project to be more >>>>>>>>>> explicit about regions of the code that assume that the garbage- >>>>>>>>>> collector can't run within them?[3] (since the GC is state that >>>>>> would >>>>>>>>>> be shared by the threads). >>>>>>>>> >>>>>>>>> The process of collecting garbage is not the only issue (and that >>>>>>>>> very issue is easiest mitigated by collecting only at specific >>>>>>>>> points - which is what we do - and have those be serializing >>>>>> points). >>>>>>>>> The main issue is the underlying memory allocator (GCC uses memory >>>>>>>>> that is garbage collected plus regular heap memory). >>>>>>>>> >>>>>>>>>> In addition I moved my paper back to our discussion about garbage >>>>>> collector >>>>>>>>>> state with outside callers.Seems we really need to do something >>>>>> about >>>>>>>>>> my wording as the idea of my project in a nutshell was to figure >>>>>>>>>> out how to mark shared state by callers and inject it into the >>>>>>>>>> garbage collector letting it known that the state was not shared >>>>>> between >>>>>>>>>> threads or shared. Seems that was on the GSoc page and in our >>>>>> discussions the issue >>>>>>>>>> is marking outside code for shared state. If that's correct then >>>>>> my >>>>>>>>>> wording of outside callers is incorrect it should have been shared >>>>>>>>>> state between threads on outside callers to the garbage collector. >>>>>>>>>> If the state is that in your wording above then great as I >>>>>> understand >>>>>>>>>> where we are going and will gladly change my wording. >>>>>>>>> >>>>>>>>> I'm still not sure what you are shooting at, the above sentences do >>>>>>>>> not make any sense to me. >>>>>>>>> >>>>>>>>>> Also freelists don't work here as the state is shared at the >>>>>> caller's >>>>>>>>>> end which would need two major issues: >>>>>>>>>> 1. Locking on nodes of the >>>>>>>>>> freelists when two threads allocate at the same thing which can be >>>>>> a >>>>>>>>>> problem if the shared state is shared a lot >>>>>>>>>> 2. Locking allocation with >>>>>>>>>> large numbers of callers can starve threads >>>>>>>>> >>>>>>>>> First of all allocating memory from the GC pool is not the main >>>>>>>>> work of GIMPLE passes so simply serializing at allocation time >>>>>> might >>>>>>>>> work out. Second free lists of course do work. What you'd do is >>>>>>>>> have a fast path in allocation using a thread-local "free list" >>>>>>>>> which you can allocate from without taking any lock. Maybe I >>>>>> should >>>>>>>>> explain "free list" since that term doesn't make too much sense in >>>>>>>>> a garbage collector world. What I'd do is when a client thread >>>>>>>>> asks for memory of size N allocate M objects of that size but put >>>>>>>>> M - 1 on the client thread local "free list" to be allocated >>>>>> lock-free >>>>>>>>> from for the next M - 1 calls. Note that garbage collected memory >>>>>>>>> objects are only handed out in fixed chunks (powers of two plus >>>>>>>>> a few special sizes) so you'd have one "free list" per chunk size >>>>>>>>> per thread. >>>>>>>>> >>>>>>>>> The collection itself (mark & sweep) would be fully serialized >>>>>> still >>>>>>>>> (and not return to any threads local "free list"). >>>>>>>>> >>>>>>>>> ggc_free'd objects _might_ go to the threads "free list"s (yeah, we >>>>>>>>> _do_ have ggc_free ...). >>>>>>>>> >>>>>>>>> As said, I don't see GC or the memory allocator as sth interesting >>>>>>>>> to work on for parallelization until the basic setup works and it >>>>>>>>> proves to be a bottleneck. >>>>>>>>> >>>>>>>>>> Seems that working on the garbage collector itself isn't the issue >>>>>> but >>>>>>>>>> the callers as I just figured out as related to your state idea. >>>>>> Let me >>>>>>>>>> know if that's correct and if the wording change I mentioned is >>>>>> fine >>>>>>>>>> with you as that's the state it seems that needs to be changed. >>>>>>>>>> Nick >>>>>>>>> >>>>>>>>> Richard. >>>>>>>>> >>>>>>>> >>>>>>>> That's fine and it's my fault for not understanding you better. I >>>>>> was aware >>>>>>>> of the expand_functions_all being taken for passes.c. However it >>>>>> seems >>>>>>>> two other issues are these sets as related to threads: >>>>>>>> 1.finalize_compilation_unit >>>>>>>> 2.and the ipa set of pass functions >>>>>>>> >>>>>>>> If I'm understanding it correctly number 1 seems to be a early >>>>>> version of >>>>>>>> expand_all_functions for the GENERIC representation if that's the >>>>>> case >>>>>>>> it really should be fixed. Not sure which is a better issue as both >>>>>>>> seem to have issues either at the GENERIC level or GIMPLE level with >>>>>> shared >>>>>>>> state. >>>>>>>> >>>>>>>> Let me know if this is better as it seems now that I really think >>>>>> about >>>>>>>> it GIMPLE or GENERIC functions in passes.c are the main issue. >>>>>>>> >>>>>>>> Sorry for the misunderstanding and hopefully one of functions listed >>>>>> is better >>>>>>>> for moving forward with my proposal, >>>>>>> >>>>>>> Sorry, but guessing at useful projects by skimming through GCC code >>>>>>> at this point isn't the way to go forward - this new "idea" lacks >>>>>>> both detail and understanding. Please try to stick to one of the >>>>>>> suggested projects or do more thorough research in case you want >>>>>>> to work on a new project idea next year. >>>>>>> >>>>>>> Thanks, >>>>>>> Richard. >>>>>>> >>>>>> >>>>>> I was talking about cgraphunits.c and it seems that according to this: >>>>>> Parallelize compilation using threads. GCC currently has an awful lot >>>>>> of truly global state and even more per-pass global state which makes >>>>>> this somewhat hard. The idea is to avoid issues with global state as >>>>>> much as possible by partitioning the compilation pipeline in pieces >>>>>> that share as little global state as possible and ensure each thread >>>>>> only works in one of those partitions. The biggest roadblock will be >>>>>> the not thread-safe memory allocator of GCC garbage collector. The goal >>>>>> of this project is to have a compilation pipeline driven by a scheduler >>>>>> assigning functions to be optimized to the partitions in the pipeline. >>>>>> This project would be mentored by Richard Biener. Required skills >>>>>> include: C/C++, ability to analyze big complex code base, >>>>>> parallelization >>>>>> >>>>>> We are trying to create a rendering pipeline if I'm correct and it >>>>>> seems that the GENERIC level needs finalize_compilation_unit >>>>>> to be fixed like expand_all_functions at the GIMPLE. That's my point it >>>>>> still is within that project. Here is what I wrote >>>>>> as I figured out that it was shared state related to GENERIC passing to >>>>>> GIMPLE which is a bottleneck or would be in the >>>>>> threaded pipeline. >>>>>> >>>>>> https://docs.google.com/document/d/1BKVeh62IpigsQYf_fJqkdu_js0EeGdKtXInkWZ-DtU0/edit >>>>> >>>>> The pre- post-IPA parts cannot be easily parallelized and that includes GENERIC to GIMPLE translation. This is why the project should focus on the much easier post-IPA and pre-RTL parts of the compilation pipeline since there interaction between functions is minimal. >>>>> >>>>> Richard. >>>>> >>>>>> >>>>>> Nick >>>>> >>>> Richard, >>>> Thanks seems we finally got somewhere I rewrote it for that. Please just doubt check tomorrow before I submit later tomorrow >>>> as I would just want to make sure it's OK with you. Here is the link: >>>> https://docs.google.com/document/d/1BKVeh62IpigsQYf_fJqkdu_js0EeGdKtXInkWZ-DtU0/edit >>>> >>>> The only thing is if you want more detail in the project part as I just think it's >>>> enough but let me know if not. >>> >>> It seems the project part lacks any specifics and those are to be >>> researched and discovered in the Mid may - Late May timeframe of >>> your Roadmap? >> Yes this is because after looking into ipa functions it would take >> some time to really understand it not just at a surface level. It's >> not the functions themselves but thinking about how the pipeline >> would work in the POST IPA and pre RTL functions. >> >>> I do not understand what "Create optimized functions ... for avoiding >>> sharing of state ..." is about - what kind of functions do you intend >>> to write? Often examples help. >>> >> Fix functions at both the post IPA and pre RTL levels for avoiding sharing of state and increasing >> throughput with multiple threads in the compiler is what I actually meant. There aren't any new >> functions it's optimizing the paths. So it was a small typo. Thanks for noticing it. >> >> I will submit either later today or early tomorrow so if there's anything else let me know, > > I think you will have to explain how you want to avoid sharing of state > and at least vaguely _what_ global state you want to avoid. The other > proposal side-steps global state of individual GIMPLE passes by never > executing the same GIMPLE pass twice at the same time for example, > so it's the scheduler avoiding shared state. What other global state > do you have in mind? > > Richard. > If I understanding correctly there are two at the IPA late pass we can pass over the program partition multiple times that would be fixed. For pre RTL seems cgraph_function_versioning has issues with copying functions more than once and then doing the transformation on different threads multiple times. If that's enough info I will just edit the roadmap, introduction and project parts to explain the details. Nick