From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-return-198753-listarch-gcc=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 30370 invoked by alias); 7 Apr 2019 09:31:53 -0000
Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc/>
List-Post: <mailto:gcc@gcc.gnu.org>
List-Help: <http://gcc.gnu.org/ml/>
Sender: gcc-owner@gcc.gnu.org
Received: (qmail 30361 invoked by uid 89); 7 Apr 2019 09:31:53 -0000
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: No, score=-5.7 required=5.0 tests=AWL,BAYES_00,SPF_PASS autolearn=ham version=3.3.1 spammy=skills, figured, docs.google.com, GSOC
X-HELO: mx1.suse.de
Received: from mx2.suse.de (HELO mx1.suse.de) (195.135.220.15) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Sun, 07 Apr 2019 09:31:51 +0000
Received: from relay2.suse.de (unknown [195.135.220.254])	by mx1.suse.de (Postfix) with ESMTP id E602DAFEF;	Sun,  7 Apr 2019 09:31:48 +0000 (UTC)
Date: Sun, 07 Apr 2019 09:31:00 -0000
User-Agent: K-9 Mail for Android
In-Reply-To: <ec115fb3-ee31-b697-e370-1405ee14cabf@gmail.com>
References: <4327491a-395b-bee0-145a-eddd8f64b0ba@gmail.com> <CAFiYyc0AbLoU6=OD6D0G=nRbtVSdujrQNBH-miNWiF5QW2kijA@mail.gmail.com> <f5ab7c50-d8a0-cdae-dc0c-218a90303f3b@gmail.com> <alpine.LSU.2.20.1903290959560.27537@zhemvz.fhfr.qr> <63e78666-ceca-94d8-9ac4-101130afab4c@gmail.com> <70316c90-b241-3f88-56d8-9e59f3eac0ee@gmail.com> <alpine.LSU.2.20.1904011153200.27537@zhemvz.fhfr.qr> <038afc5c-85fa-8d78-5b84-289207c7b17f@gmail.com> <alpine.LSU.2.20.1904011546340.27537@zhemvz.fhfr.qr> <ebe2bcc8-f4bb-83bd-3c41-b110f762d209@gmail.com> <alpine.LSU.2.20.1904031320420.27537@zhemvz.fhfr.qr> <1eaa53df-3ed9-2a5f-d6db-2a224ee9da01@gmail.com> <alpine.LSU.2.20.1904051223450.27537@zhemvz.fhfr.qr> <ec115fb3-ee31-b697-e370-1405ee14cabf@gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
Subject: Re: GSOC Proposal
To: nick <xerofoify@gmail.com>
CC: GCC Development <gcc@gcc.gnu.org>
From: Richard Biener <rguenther@suse.de>
Message-ID: <DC08D4BB-8B93-4C6E-A9C7-2216AED0A563@suse.de>
X-SW-Source: 2019-04/txt/msg00098.txt.bz2

On April 5, 2019 6:11:15 PM GMT+02:00, nick <xerofoify@gmail.com> wrote:
>
>
>On 2019-04-05 6:25 a.m., Richard Biener wrote:
>> On Wed, 3 Apr 2019, nick wrote:
>>=20
>>>
>>>
>>> On 2019-04-03 7:30 a.m., Richard Biener wrote:
>>>> On Mon, 1 Apr 2019, nick wrote:
>>>>
>>>>>
>>>>>
>>>>> On 2019-04-01 9:47 a.m., Richard Biener wrote:
>>>>>> On Mon, 1 Apr 2019, nick wrote:
>>>>>>
>>>>>>> Well I'm talking about the shared roots of this garbage
>collector core state=20
>>>>>>> data structure or just struct ggc_root_tab.
>>>>>>>
>>>>>>> But also this seems that this to be no longer shared globally if
>I'm not mistaken=20
>>>>>>> or this:
>>>>>>> static vec<const_ggc_root_tab_t> extra_root_vec;
>>>>>>>
>>>>>>> Not sure after reading the code which is a bigger deal through
>so I wrote
>>>>>>> my proposal not just asking which is a better issue for not
>being thread
>>>>>>> safe. Sorry about that.
>>>>>>>
>>>>>>> As for the second question injection seems to not be the issue
>or outside
>>>>>>> callers but just internal so phase 3 or step 3 would now be:
>>>>>>> Find internal callers or users of x where x is one of the above
>rather
>>>>>>> than injecting outside callers. Which answers my second question
>about
>>>>>>> external callers being a issue still.
>>>>>>>
>>>>>>> Let me know which  of the two is a better issue:
>>>>>>> 1. struct ggc_root_tabs being shared
>>>>>>> 2.static vec<const_ggc_root_tab_t> extra_root_vec; as a shared
>heap or
>>>>>>> vector of root nodes for each type of allocation
>>>>>>>
>>>>>>> and I will gladly rewrite my proposal sections for that
>>>>>>> as needs to be reedited.
>>>>>>
>>>>>> I don't think working on the garbage collector as a separate
>>>>>> GSoC project is useful at this point.  Doing locking around
>>>>>> allocation seems like a good short-term solution and if that
>>>>>> turns out to be a performance issue for the threaded part
>>>>>> using per-thread freelists is likely an easy to deploy
>>>>>> solution.
>>>>>>
>>>>>> Richard.
>>>>>>
>>>>> I agree but we were discussing this:
>>>>> Or maybe a project to be more
>>>>> explicit about regions of the code that assume that the garbage-
>>>>> collector can't run within them?[3] (since the GC is state that
>would
>>>>> be shared by the threads).
>>>>
>>>> The process of collecting garbage is not the only issue (and that
>>>> very issue is easiest mitigated by collecting only at specific
>>>> points - which is what we do - and have those be serializing
>points).
>>>> The main issue is the underlying memory allocator (GCC uses memory
>>>> that is garbage collected plus regular heap memory).
>>>>
>>>>> In addition I moved my paper back to our discussion about garbage
>collector
>>>>> state with outside callers.Seems we really need to do something
>about
>>>>> my wording as the idea of my project in a nutshell was to figure
>>>>> out how to mark shared state by callers and inject it into the
>>>>> garbage collector letting it known that the state was not shared
>between
>>>>> threads or shared. Seems that was on the GSoc page and in our
>discussions the issue
>>>>> is marking outside code for shared state. If that's correct then
>my
>>>>> wording of outside callers is incorrect it should have been shared
>>>>> state between threads on outside callers to the garbage collector.
>>>>> If the state is that in your wording above then great as I
>understand
>>>>> where we are going and will gladly change my wording.
>>>>
>>>> I'm still not sure what you are shooting at, the above sentences do
>>>> not make any sense to me.
>>>>
>>>>> Also freelists don't work here as the state is shared at the
>caller's=20
>>>>> end which would need two major issues:
>>>>> 1. Locking on nodes of the=20
>>>>> freelists when two threads allocate at the same thing which can be
>a=20
>>>>> problem if the shared state is shared a lot
>>>>> 2. Locking allocation with=20
>>>>> large numbers of callers can starve threads
>>>>
>>>> First of all allocating memory from the GC pool is not the main
>>>> work of GIMPLE passes so simply serializing at allocation time
>might
>>>> work out.  Second free lists of course do work.  What you'd do is
>>>> have a fast path in allocation using a thread-local "free list"
>>>> which you can allocate from without taking any lock.  Maybe I
>should
>>>> explain "free list" since that term doesn't make too much sense in
>>>> a garbage collector world.  What I'd do is when a client thread
>>>> asks for memory of size N allocate M objects of that size but put
>>>> M - 1 on the client thread local "free list" to be allocated
>lock-free
>>>> from for the next M - 1 calls.  Note that garbage collected memory
>>>> objects are only handed out in fixed chunks (powers of two plus
>>>> a few special sizes) so you'd have one "free list" per chunk size
>>>> per thread.
>>>>
>>>> The collection itself (mark & sweep) would be fully serialized
>still
>>>> (and not return to any threads local "free list").
>>>>
>>>> ggc_free'd objects _might_ go to the threads "free list"s (yeah, we
>>>> _do_ have ggc_free ...).
>>>>
>>>> As said, I don't see GC or the memory allocator as sth interesting
>>>> to work on for parallelization until the basic setup works and it
>>>> proves to be a bottleneck.
>>>>
>>>>> Seems that working on the garbage collector itself isn't the issue
>but=20
>>>>> the callers as I just figured out as related to your state idea.
>Let me=20
>>>>> know if that's correct and if the wording change I mentioned is
>fine=20
>>>>> with you as that's the state it seems that needs to be changed.
>>>>> Nick=20
>>>>
>>>> Richard.
>>>>
>>>
>>> That's fine and it's my fault for not understanding you better. I
>was aware=20
>>> of the expand_functions_all being taken for passes.c. However it
>seems
>>> two other issues are these sets as related to threads:
>>> 1.finalize_compilation_unit
>>> 2.and the ipa set of pass functions
>>>
>>> If I'm understanding it correctly number 1 seems to be a early
>version of
>>> expand_all_functions for the GENERIC representation if that's the
>case
>>> it really should be fixed. Not sure which is a better issue as both
>>> seem to have issues either at the GENERIC level or GIMPLE level with
>shared
>>> state.
>>>
>>> Let me know if this is better as it seems now that I really think
>about=20
>>> it GIMPLE or GENERIC functions in passes.c are the main issue.=20
>>>
>>> Sorry for the misunderstanding and hopefully one of functions listed
>is better
>>> for moving forward with my proposal,
>>=20
>> Sorry, but guessing at useful projects by skimming through GCC code
>> at this point isn't the way to go forward - this new "idea" lacks
>> both detail and understanding.  Please try to stick to one of the
>> suggested projects or do more thorough research in case you want
>> to work on a new project idea next year.
>>=20
>> Thanks,
>> Richard.
>>=20
>
>I was talking about cgraphunits.c and it seems that according to this:
>Parallelize compilation using threads. GCC currently has an awful lot
>of truly global state and even more per-pass global state which makes
>this somewhat hard. The idea is to avoid issues with global state as
>much as possible by partitioning the compilation pipeline in pieces
>that share as little global state as possible and ensure each thread
>only works in one of those partitions. The biggest roadblock will be
>the not thread-safe memory allocator of GCC garbage collector. The goal
>of this project is to have a compilation pipeline driven by a scheduler
>assigning functions to be optimized to the partitions in the pipeline.
>This project would be mentored by Richard Biener. Required skills
>include: C/C++, ability to analyze big complex code base,
>parallelization
>
>We are trying to create a rendering pipeline if I'm correct and it
>seems that the GENERIC level needs finalize_compilation_unit
>to be fixed like expand_all_functions at the GIMPLE. That's my point it
>still is within that project. Here is what I wrote
>as I figured out that it was shared state related to GENERIC passing to
>GIMPLE which is a bottleneck or would be in the=20
>threaded pipeline.
>
>https://docs.google.com/document/d/1BKVeh62IpigsQYf_fJqkdu_js0EeGdKtXInkWZ=
-DtU0/edit

The pre- post-IPA parts cannot be easily parallelized and that includes GEN=
ERIC to GIMPLE translation. This is why the project should focus on the muc=
h easier post-IPA and pre-RTL parts of the compilation pipeline since there=
 interaction between functions is minimal.=20

Richard.=20

>
>Nick