From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 113245 invoked by alias); 8 Apr 2019 13:19:09 -0000 Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org Received: (qmail 112872 invoked by uid 89); 8 Apr 2019 13:19:09 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-5.1 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.1 spammy= X-HELO: mail-qt1-f176.google.com Received: from mail-qt1-f176.google.com (HELO mail-qt1-f176.google.com) (209.85.160.176) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 08 Apr 2019 13:19:06 +0000 Received: by mail-qt1-f176.google.com with SMTP id d13so15309590qth.5 for ; Mon, 08 Apr 2019 06:19:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=NIOiiYeym9N4/ncrbM5jv+oQ2BHBgipCQGpFc2Gyf5U=; b=BtE6GGrKmNuBXcTwCO20QoJnHIOBnwCAqQAtb51u67ERfaUagZWUg6ls/7xn/lQye0 LJJJMldI7OmbygrO7hxusESDplQFftQoelQ05CZuzLaiZFNRdRu9Hf3nQmD0fk5iZq+H bsoQ5CDOpOjSgdWRveZwi8LzN1xLXndY7EevOAZfvwIAvPYzX5L+C89ewHAvvqZMOrzs X4keqHykh991ABlurDm1zdLvC3Sa/fqLUBlWqvE6NpMbDJGvLdEhmTh94OPCpUpvHhK4 4YOGQX3iezQ0A4cNMzMyYqqREOfoYYWivO8LB7e1GjAdGq4Otiqew+oQ0uHQQ92ybiL+ B6tg== Return-Path: Received: from [192.168.1.101] (173-230-163-230.cable.teksavvy.com. [173.230.163.230]) by smtp.googlemail.com with ESMTPSA id 6sm12807334qtt.8.2019.04.08.06.19.02 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 08 Apr 2019 06:19:02 -0700 (PDT) Subject: Re: GSOC Proposal To: Richard Biener Cc: GCC Development References: <4327491a-395b-bee0-145a-eddd8f64b0ba@gmail.com> <63e78666-ceca-94d8-9ac4-101130afab4c@gmail.com> <70316c90-b241-3f88-56d8-9e59f3eac0ee@gmail.com> <038afc5c-85fa-8d78-5b84-289207c7b17f@gmail.com> <1eaa53df-3ed9-2a5f-d6db-2a224ee9da01@gmail.com> From: nick Message-ID: <4a636849-7aa8-e0a2-2ac9-63ad17e56ccc@gmail.com> Date: Mon, 08 Apr 2019 13:19:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-IsSubscribed: yes X-SW-Source: 2019-04/txt/msg00118.txt.bz2 On 2019-04-08 3:29 a.m., Richard Biener wrote: > On Sun, 7 Apr 2019, nick wrote: > >> >> >> On 2019-04-07 5:31 a.m., Richard Biener wrote: >>> On April 5, 2019 6:11:15 PM GMT+02:00, nick wrote: >>>> >>>> >>>> On 2019-04-05 6:25 a.m., Richard Biener wrote: >>>>> On Wed, 3 Apr 2019, nick wrote: >>>>> >>>>>> >>>>>> >>>>>> On 2019-04-03 7:30 a.m., Richard Biener wrote: >>>>>>> On Mon, 1 Apr 2019, nick wrote: >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 2019-04-01 9:47 a.m., Richard Biener wrote: >>>>>>>>> On Mon, 1 Apr 2019, nick wrote: >>>>>>>>> >>>>>>>>>> Well I'm talking about the shared roots of this garbage >>>> collector core state >>>>>>>>>> data structure or just struct ggc_root_tab. >>>>>>>>>> >>>>>>>>>> But also this seems that this to be no longer shared globally if >>>> I'm not mistaken >>>>>>>>>> or this: >>>>>>>>>> static vec extra_root_vec; >>>>>>>>>> >>>>>>>>>> Not sure after reading the code which is a bigger deal through >>>> so I wrote >>>>>>>>>> my proposal not just asking which is a better issue for not >>>> being thread >>>>>>>>>> safe. Sorry about that. >>>>>>>>>> >>>>>>>>>> As for the second question injection seems to not be the issue >>>> or outside >>>>>>>>>> callers but just internal so phase 3 or step 3 would now be: >>>>>>>>>> Find internal callers or users of x where x is one of the above >>>> rather >>>>>>>>>> than injecting outside callers. Which answers my second question >>>> about >>>>>>>>>> external callers being a issue still. >>>>>>>>>> >>>>>>>>>> Let me know which of the two is a better issue: >>>>>>>>>> 1. struct ggc_root_tabs being shared >>>>>>>>>> 2.static vec extra_root_vec; as a shared >>>> heap or >>>>>>>>>> vector of root nodes for each type of allocation >>>>>>>>>> >>>>>>>>>> and I will gladly rewrite my proposal sections for that >>>>>>>>>> as needs to be reedited. >>>>>>>>> >>>>>>>>> I don't think working on the garbage collector as a separate >>>>>>>>> GSoC project is useful at this point. Doing locking around >>>>>>>>> allocation seems like a good short-term solution and if that >>>>>>>>> turns out to be a performance issue for the threaded part >>>>>>>>> using per-thread freelists is likely an easy to deploy >>>>>>>>> solution. >>>>>>>>> >>>>>>>>> Richard. >>>>>>>>> >>>>>>>> I agree but we were discussing this: >>>>>>>> Or maybe a project to be more >>>>>>>> explicit about regions of the code that assume that the garbage- >>>>>>>> collector can't run within them?[3] (since the GC is state that >>>> would >>>>>>>> be shared by the threads). >>>>>>> >>>>>>> The process of collecting garbage is not the only issue (and that >>>>>>> very issue is easiest mitigated by collecting only at specific >>>>>>> points - which is what we do - and have those be serializing >>>> points). >>>>>>> The main issue is the underlying memory allocator (GCC uses memory >>>>>>> that is garbage collected plus regular heap memory). >>>>>>> >>>>>>>> In addition I moved my paper back to our discussion about garbage >>>> collector >>>>>>>> state with outside callers.Seems we really need to do something >>>> about >>>>>>>> my wording as the idea of my project in a nutshell was to figure >>>>>>>> out how to mark shared state by callers and inject it into the >>>>>>>> garbage collector letting it known that the state was not shared >>>> between >>>>>>>> threads or shared. Seems that was on the GSoc page and in our >>>> discussions the issue >>>>>>>> is marking outside code for shared state. If that's correct then >>>> my >>>>>>>> wording of outside callers is incorrect it should have been shared >>>>>>>> state between threads on outside callers to the garbage collector. >>>>>>>> If the state is that in your wording above then great as I >>>> understand >>>>>>>> where we are going and will gladly change my wording. >>>>>>> >>>>>>> I'm still not sure what you are shooting at, the above sentences do >>>>>>> not make any sense to me. >>>>>>> >>>>>>>> Also freelists don't work here as the state is shared at the >>>> caller's >>>>>>>> end which would need two major issues: >>>>>>>> 1. Locking on nodes of the >>>>>>>> freelists when two threads allocate at the same thing which can be >>>> a >>>>>>>> problem if the shared state is shared a lot >>>>>>>> 2. Locking allocation with >>>>>>>> large numbers of callers can starve threads >>>>>>> >>>>>>> First of all allocating memory from the GC pool is not the main >>>>>>> work of GIMPLE passes so simply serializing at allocation time >>>> might >>>>>>> work out. Second free lists of course do work. What you'd do is >>>>>>> have a fast path in allocation using a thread-local "free list" >>>>>>> which you can allocate from without taking any lock. Maybe I >>>> should >>>>>>> explain "free list" since that term doesn't make too much sense in >>>>>>> a garbage collector world. What I'd do is when a client thread >>>>>>> asks for memory of size N allocate M objects of that size but put >>>>>>> M - 1 on the client thread local "free list" to be allocated >>>> lock-free >>>>>>> from for the next M - 1 calls. Note that garbage collected memory >>>>>>> objects are only handed out in fixed chunks (powers of two plus >>>>>>> a few special sizes) so you'd have one "free list" per chunk size >>>>>>> per thread. >>>>>>> >>>>>>> The collection itself (mark & sweep) would be fully serialized >>>> still >>>>>>> (and not return to any threads local "free list"). >>>>>>> >>>>>>> ggc_free'd objects _might_ go to the threads "free list"s (yeah, we >>>>>>> _do_ have ggc_free ...). >>>>>>> >>>>>>> As said, I don't see GC or the memory allocator as sth interesting >>>>>>> to work on for parallelization until the basic setup works and it >>>>>>> proves to be a bottleneck. >>>>>>> >>>>>>>> Seems that working on the garbage collector itself isn't the issue >>>> but >>>>>>>> the callers as I just figured out as related to your state idea. >>>> Let me >>>>>>>> know if that's correct and if the wording change I mentioned is >>>> fine >>>>>>>> with you as that's the state it seems that needs to be changed. >>>>>>>> Nick >>>>>>> >>>>>>> Richard. >>>>>>> >>>>>> >>>>>> That's fine and it's my fault for not understanding you better. I >>>> was aware >>>>>> of the expand_functions_all being taken for passes.c. However it >>>> seems >>>>>> two other issues are these sets as related to threads: >>>>>> 1.finalize_compilation_unit >>>>>> 2.and the ipa set of pass functions >>>>>> >>>>>> If I'm understanding it correctly number 1 seems to be a early >>>> version of >>>>>> expand_all_functions for the GENERIC representation if that's the >>>> case >>>>>> it really should be fixed. Not sure which is a better issue as both >>>>>> seem to have issues either at the GENERIC level or GIMPLE level with >>>> shared >>>>>> state. >>>>>> >>>>>> Let me know if this is better as it seems now that I really think >>>> about >>>>>> it GIMPLE or GENERIC functions in passes.c are the main issue. >>>>>> >>>>>> Sorry for the misunderstanding and hopefully one of functions listed >>>> is better >>>>>> for moving forward with my proposal, >>>>> >>>>> Sorry, but guessing at useful projects by skimming through GCC code >>>>> at this point isn't the way to go forward - this new "idea" lacks >>>>> both detail and understanding. Please try to stick to one of the >>>>> suggested projects or do more thorough research in case you want >>>>> to work on a new project idea next year. >>>>> >>>>> Thanks, >>>>> Richard. >>>>> >>>> >>>> I was talking about cgraphunits.c and it seems that according to this: >>>> Parallelize compilation using threads. GCC currently has an awful lot >>>> of truly global state and even more per-pass global state which makes >>>> this somewhat hard. The idea is to avoid issues with global state as >>>> much as possible by partitioning the compilation pipeline in pieces >>>> that share as little global state as possible and ensure each thread >>>> only works in one of those partitions. The biggest roadblock will be >>>> the not thread-safe memory allocator of GCC garbage collector. The goal >>>> of this project is to have a compilation pipeline driven by a scheduler >>>> assigning functions to be optimized to the partitions in the pipeline. >>>> This project would be mentored by Richard Biener. Required skills >>>> include: C/C++, ability to analyze big complex code base, >>>> parallelization >>>> >>>> We are trying to create a rendering pipeline if I'm correct and it >>>> seems that the GENERIC level needs finalize_compilation_unit >>>> to be fixed like expand_all_functions at the GIMPLE. That's my point it >>>> still is within that project. Here is what I wrote >>>> as I figured out that it was shared state related to GENERIC passing to >>>> GIMPLE which is a bottleneck or would be in the >>>> threaded pipeline. >>>> >>>> https://docs.google.com/document/d/1BKVeh62IpigsQYf_fJqkdu_js0EeGdKtXInkWZ-DtU0/edit >>> >>> The pre- post-IPA parts cannot be easily parallelized and that includes GENERIC to GIMPLE translation. This is why the project should focus on the much easier post-IPA and pre-RTL parts of the compilation pipeline since there interaction between functions is minimal. >>> >>> Richard. >>> >>>> >>>> Nick >>> >> Richard, >> Thanks seems we finally got somewhere I rewrote it for that. Please just doubt check tomorrow before I submit later tomorrow >> as I would just want to make sure it's OK with you. Here is the link: >> https://docs.google.com/document/d/1BKVeh62IpigsQYf_fJqkdu_js0EeGdKtXInkWZ-DtU0/edit >> >> The only thing is if you want more detail in the project part as I just think it's >> enough but let me know if not. > > It seems the project part lacks any specifics and those are to be > researched and discovered in the Mid may - Late May timeframe of > your Roadmap? Yes this is because after looking into ipa functions it would take some time to really understand it not just at a surface level. It's not the functions themselves but thinking about how the pipeline would work in the POST IPA and pre RTL functions. > I do not understand what "Create optimized functions ... for avoiding > sharing of state ..." is about - what kind of functions do you intend > to write? Often examples help. > Fix functions at both the post IPA and pre RTL levels for avoiding sharing of state and increasing throughput with multiple threads in the compiler is what I actually meant. There aren't any new functions it's optimizing the paths. So it was a small typo. Thanks for noticing it. I will submit either later today or early tomorrow so if there's anything else let me know, Nick > Richard. > >> Thanks for your patience, >> Nick