From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 52994 invoked by alias); 21 Mar 2018 19:04:35 -0000 Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org Received: (qmail 52983 invoked by uid 89); 21 Mar 2018 19:04:34 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.8 required=5.0 tests=BAYES_00,FORGED_HOTMAIL_RCVD2,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_PASS,SPF_PASS autolearn=no version=3.3.2 spammy=Their, sebaspe97hotmailcom, sebaspe97@hotmail.com, H*i:sk:gB_Zfwf X-HELO: EUR01-HE1-obe.outbound.protection.outlook.com Received: from mail-oln040092065032.outbound.protection.outlook.com (HELO EUR01-HE1-obe.outbound.protection.outlook.com) (40.92.65.32) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 21 Mar 2018 19:04:32 +0000 Received: from HE1EUR01FT055.eop-EUR01.prod.protection.outlook.com (10.152.0.51) by HE1EUR01HT248.eop-EUR01.prod.protection.outlook.com (10.152.1.124) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P256) id 15.20.567.18; Wed, 21 Mar 2018 19:04:29 +0000 Received: from AM2PR05MB1186.eurprd05.prod.outlook.com (10.152.0.52) by HE1EUR01FT055.mail.protection.outlook.com (10.152.1.28) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P256) id 15.20.567.18 via Frontend Transport; Wed, 21 Mar 2018 19:04:29 +0000 Received: from AM2PR05MB1186.eurprd05.prod.outlook.com ([fe80::f933:a57f:2b17:e4e6]) by AM2PR05MB1186.eurprd05.prod.outlook.com ([fe80::f933:a57f:2b17:e4e6%17]) with mapi id 15.20.0588.017; Wed, 21 Mar 2018 19:04:28 +0000 From: Sebastiaan Peters To: Richard Biener CC: David Malcolm , Sebastiaan Peters , "gcc@gcc.gnu.org" Subject: Re: GSOC Question about the parallelization project Date: Wed, 21 Mar 2018 19:04:00 -0000 Message-ID: References: <2C17E153-DCDC-4E24-B9D3-64E0FF358758@gmail.com> <1521557360.5688.30.camel@redhat.com> , In-Reply-To: x-incomingtopheadermarker: OriginalChecksum:D52E996460E5335FACCA25AEB7816228BD457346A9FC6206C06E3FBB70DD3FE1;UpperCasedChecksum:55441759F9243B88CDBDA631667E9F0469F36E3968652517FEAAEE0955B8382C;SizeAsReceived:8102;Count:47 x-ms-exchange-messagesentrepresentingtype: 1 x-tmn: [HtTR7PC24GsoC2gaWG9kXT8I6E/oeIK3nHfF7gsCc4R78/uznArKa66dpEo5+adZ7h+M8A/yCzE=] x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1;HE1EUR01HT248;6:yXSmAM17/bIi060r1eTnkcYhxnD/1TMywIxlbmESlgaSX9WDx1LwSOnFukC8FdVe5HV8cqFQqK+5pFQDcWquq07wXp47FPOAFd1Ea80M1kSviCv8uR/pZARYZ5mNno0ik45e67A26vW6ylCK/P7QcC6DaDMqZJpUnX3DLiXNGrUbxBIaXaH7g58yCuXhEawbKRI6qolYfURmJ2Bg19QIVRGZAmGr9q0ZlSbCFYQCp1Kb7JhaAJv6XLDC1OtZynIAiBOfn9IO6aEYgRuCXtFrYP75SGFYmKHvgQ+DWnf2rXWtR/2aGOjd+Ym0iGgEw999GAbKGR+TDu1/rTyU5r30/LcBZKGw2nwbdILfM0plKE4=;5://KGAMmAyNYHxhvERcuR5o5can3eu0QK4JuCLW2Ljl9JkqCrZuuQktw4/yE/IipU15G46OVv6Sg+tnG+Vz5Ovl9aS6Yz0iYjI/YEyYzb5gNyafiMJZo9y6PPKporXFL+pvbaTvS5aFuzMBv11GCC/v0GqlNxRHRkyU0F6xmd/7Y=;24:y8Xe5XLSPdsog10i0QFlFCeqs+5D/2r/ue6ijSKSep9Hec6bTeCGDTUmx0tWDFIR8Rk8z+t9UI3h33F/NRdmj2CBAKFnbg43gu5YB/m2I3s=;7:rAKU2hWknch7MtN8g0TRtqX3OIUBcQsKqZYcz7v2SCpDXKRvWpQFMYuuL09e1vno7zpxlQwGp1fR9/BtM9GJLNn5oAihqJIUrjnxWnTbrE/PGjJOEH53rn5cvhGgyCoMHQMnse3q7tre2mWPvXbf6xDgPnVuGiEwD8CZL3nmL51TW/FEem+OpBbZ92u5UL/9hXbIEnJkAG2Zp6XlsjGzSPxmH7naAMAOJqUgFcDRO+GbWB7esq3IkRlJ3Exnb7tx x-incomingheadercount: 47 x-eopattributedmessage: 0 x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:(7020095)(201702061078)(5061506573)(5061507331)(1603103135)(2017031320274)(2017031324274)(2017031323274)(201702181274)(2017031322404)(1601125374)(1603101448)(1701031045);SRVR:HE1EUR01HT248; x-ms-exchange-slblob-mailprops: =?iso-8859-1?Q?IkxdR8L5RzEka7nC3z+0v5KG5o4hYZJvSkUV3rGaG9Esq5u0tvhDq3Xsem?= =?iso-8859-1?Q?8GxcBKusnTKS7eFdtOIn6xjsiicU1zwwzxAhaDNXTzQR8POIXOnWJ6H19w?= =?iso-8859-1?Q?7IPzWtmx/b7JFFAH+9vd0nyAWERXCcdRawVrm6+IqN4RhfR/FTKppgXG3x?= =?iso-8859-1?Q?8UdX4mZJ8Szr31Ru8VIdwE2tmP/lxcqIf/Q4Y5SnUQ+6IHsV1M7zIca0bB?= =?iso-8859-1?Q?N6wCbBirBUQ6f7IPeUgEryZZMsBxh6HnhSCYHa04WmX5+eyK9CCgr2kZr7?= =?iso-8859-1?Q?Aa4hbVNnoTdXpTif4fKtFUO+yIYFc6Dk7d/9CgmjgzDD7/zF8T7BD+9Cyq?= =?iso-8859-1?Q?RnqapzOW8FxshM82oQg031w/wQLw/ecnUqpURMR/yuoXFapO4W5hP4Si9l?= =?iso-8859-1?Q?oTXjnuhcmmwBrx0yXfKoEpUZNFy+AX2PDYiuSHyhNFg35UITus+N0oA8tP?= =?iso-8859-1?Q?njtb5Y33RDGG7MhRqGSGvtNWTeSg8/DNhvcPFIPtuofjQW+5CsbHrBJoMD?= =?iso-8859-1?Q?QHp3YwI8OQ0xMat/TkKdtAesmK7WTn8jsD9fSclSMYBKLTDYW98Wy5VpRG?= =?iso-8859-1?Q?wqU0QqI3OkuEedVwg6c/rtgcYis1N+J++MxQIJ+WbisPrNBx4uPtW2Q6WO?= =?iso-8859-1?Q?245cbq8B/Id+z3iOtPQl0c1+Duw0zRJFQrjULhiWcOB002jCXjqwZBMWGw?= =?iso-8859-1?Q?RVml5i68X699C/wFPk1IZ/C2uoUKQx0K1UZdMpH9Ynkf4XgYEoq7SX8FI6?= =?iso-8859-1?Q?XgDcnq+WbDD9d61jtzveB7oWYNYi6azlwyBvfIuprNrmm2JEtjkEbWFaP9?= =?iso-8859-1?Q?gPbh1VgSO0NpHf3d0Pz2OWWvKviicnKcmfQsXadSagIY4yQOpM3wC1qMT7?= =?iso-8859-1?Q?kvF13dNXH3Kn4/KMb/TJPq5mdqe+k+uPSl4bbqgVhlUL2ESStGm56BQIF5?= =?iso-8859-1?Q?cUwi1cYA39lhM4wa2m8XjJyiO8MeXz9S01U9yHcxEYfOqyBkDH7+dEwqvd?= =?iso-8859-1?Q?69iBd2spDtbS7uDxS/pBHV+K0i/hOvPOrmHR1R2t2sr8ZlpRqqwG//VoXN?= =?iso-8859-1?Q?FA=3D=3D?= x-ms-traffictypediagnostic: HE1EUR01HT248: x-ms-office365-filtering-correlation-id: 3785a382-94f3-4d62-122a-08d58f5e9275 x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:(444000031);SRVR:HE1EUR01HT248;BCL:0;PCL:0;RULEID:;SRVR:HE1EUR01HT248; x-forefront-prvs: 0618E4E7E1 x-forefront-antispam-report: SFV:NSPM;SFS:(7070007)(98901004);DIR:OUT;SFP:1901;SCL:1;SRVR:HE1EUR01HT248;H:AM2PR05MB1186.eurprd05.prod.outlook.com;FPR:;SPF:None;LANG:; x-microsoft-antispam-message-info: sXd9MrdDFpfVqITlAttTq/qfzaZIk8atiuATOTVhSh67y002nBoiO+rYViKZWPTm3zwDf8Rtt6diZQgg5aGfiw2Hg+FX9dTgmddyMwNkASsvorqIHreesfQ/OuIQSjyiFIcUf5KIW2+wMZZAnG2mJ2z4lryNuKhinNqLDYtwPHXtFZzKiMovD+zlZK0/A6sg spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: hotmail.com X-MS-Exchange-CrossTenant-Network-Message-Id: 3785a382-94f3-4d62-122a-08d58f5e9275 X-MS-Exchange-CrossTenant-originalarrivaltime: 21 Mar 2018 19:04:28.8699 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Internet X-MS-Exchange-CrossTenant-id: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa X-MS-Exchange-Transport-CrossTenantHeadersStamped: HE1EUR01HT248 X-SW-Source: 2018-03/txt/msg00215.txt.bz2 >On Wed, Mar 21, 2018 at 9:50 AM, Sebastiaan Peters > wrote: >>>On Tue, Mar 20, 2018 at 3:49 PM, David Malcolm wro= te: >>>> On Tue, 2018-03-20 at 14:02 +0100, Richard Biener wrote: >>>>> On Mon, Mar 19, 2018 at 9:55 PM, Richard Biener >>>>> wrote: >>>>> > On March 19, 2018 8:09:32 PM GMT+01:00, Sebastiaan Peters >>>> > 7@hotmail.com> wrote: >>>>> > > > The goal should be to extend TU wise parallelism via make to >>>>> > > > function >>>>> > > >>>>> > > wise parallelism within GCC. >>>>> > > >>>>> > > Could you please elaborate more on this? >>>>> > >>>>> > In the abstract sense you'd view the compilation process separated >>>>> > into N stages, each function being processed by each. You'd assign >>>>> > a thread to each stage and move the work items (the functions) >>>>> > across the set of threads honoring constraints such as an IPA stage >>>>> > needing all functions completed the previous stage. That allows you >>>>> > to easier model the constraints due to shared state (like no pass >>>>> > operating on two functions at the same time) compared to a model >>>>> > where you assign a thread to each function. >>>>> > >>>>> > You'll figure that the easiest point in the pipeline to try this >>>>> > 'pipelining' is after IPA has completed and until RTL is generated. >>>>> > >>>>> > Ideally the pipelining would start as early as the front ends >>>>> > finished parsing a function and ideally we'd have multiple >>>>> > functions in the RTL pipeline. >>>>> > >>>>> > The main obstacles will be the global state in the compiler of >>>>> > which there is the least during the GIMPLE passes (mostly cfun and >>>>> > current_function_decl plus globals in the individual passes which >>>>> > is easiest dealt with by not allowing a single pass to run at the >>>>> > same time in multiple threads). TLS can be used for some of the >>>>> > global state plus of course some global data structures need >>>>> > locking. >> >> This would mean that all the passes have to be individually analyzed for >> which global state >> they use, and lock/schedule them accordingly? > >Their private global state would be excempt by assuring that a pass never >runs twice at the same time. > >The global state that remains should be the same for all passes we are tal= king >about (during the late GIMPLE optimization phase which I'd tackle). > >> If this is the case, is there any documentation that describes the pre-r= eqs >> for each pass? >> I have looked at the internal documentation, however it seems that all of >> this still has to be created? > >The prereqs are actually the same and not very well documented (if at all). >There's the global GC memory pool where we allocate statements and >stuff like that from (and luckyly statements themselves are function priva= te). >Then there's global trees like types ('int') where modification needs to be >appropriately guarded.=A0 Note that "modification" means for example >building a new type for the address of 'int' given that all different >pointer types >to 'int' are chained in a list rooted in the tree for 'int'.=A0 That >means (a few?) >tree building helpers need to be guarded with locks.=A0 I don't have a gre= at >idea how to identify those apart from knowing them in advance or running >into races ... my gut feeling is that there's not a lot to guard but I may >be wrong ;) What does it mean to be a node of a type tree? Does it describe information about that type, or does it keep a reference to where something of that type has been declared? >> As to how this would be implemented, it seems the easiest way would be to >> extend the macros to >> accept a pre-req pass. This would encourage more documentation since the >> dependencies >> become explicit instead of the current implicit ordering. > >Actually the order is quite explicit.=A0 Maybe I now understand your >question - no, >passes do not "communicate" between each other via global state, all such >state is per function and the execution order of passes on a given function >is hard-coded in passes.def. > >> Assuming the dependencies for the all the tree-ssa passes have to be >> individually analyzed. >> Currently I have this as my timeline: >>=A0 =A0 =A0- Parallelize the execution of the post-IPA pre-RTL, and a few= tree-ssa >> passes (mid-may - early june) >>=A0 =A0 =A0- Test for possible reproducibility issues for the binary/debu= g info >> (early june - late june) >>=A0 =A0 =A0- Parallelize the rest of tree-ssa (late june - late july) >>=A0 =A0 =A0- Update documentation and test again for reproducibility issu= es (late >> july - early august) >> >> Would this be acceptable? > >Sounds ambitious ;)=A0 But yes, it sounds reasonable.=A0 I don't exactly >understand what "Parallelize the rest of tree-ssa" means though.=A0 If >you want to tackle a tiny bit more I'd rather include "RTL" by treating >the whole RTL part as a single "pass" (as said only one function can >be in RTL right now). > I was under the assumption that passes had to be indivdually analysed when I wrote that. The timeline is now updated to include some extra time in the beginning to familiarize myself a bit more with the project, and added the RTL as an optional deliverable:) I added a draft of the proposal[0], any feedback would be highly appreciate= d. >>>>> Oh, and just to mention - there are a few things that may block >>>>> adoption in the end >>>>> like whether builds are still reproducible (we allocate things like >>>>> DECL_UID from >>>>> global pools and doing that somewhat randomly because of threading >>>>> might - but not >>>>> must - change code generation).=A0 Or that some diagnostics will appe= ar >>>>> in >>>>> non-deterministic order, or that dump files are messed up (both >>>>> issues could be >>>>> solved by code dealing with the issue, like buffering and doing a re- >>>>> play in >>>>> program order).=A0 I guess reproducability is important when it comes >>>>> down to >>>>> debugging code-generation issues - I'd prefer to debug gcc when it >>>>> doesn't run >>>>> threaded but if that doesn't reproduce an issue that's bad. >>>>> >>>>> So the most important "milestone" of this project is to identify such >>>>> issues and >>>>> document them somewhere. >>>> >>>> One issue would be the garbage-collector: there are plenty of places in >>>> GCC that have hidden assumptions that "a collection can't happen here" >>>> (where we have temporaries that reference GC-managed objects, but which >>>> aren't tracked by GC-roots). >>>> >>>> I had some patches for that back in 2014 that I think I managed to drop >>>> on the floor (sorry): >>>>=A0 =A0https://gcc.gnu.org/ml/gcc-patches/2014-11/msg01300.html >>>>=A0 =A0https://gcc.gnu.org/ml/gcc-patches/2014-11/msg01340.html >>>>=A0 =A0https://gcc.gnu.org/ml/gcc-patches/2014-11/msg01510.html >> >> Would there be a way to easily create a static analyzer to find these >> untracked temporaries? > >I don't think so. > >> A quick look at registered passes makes me count ~135 tree-ssa passes, >> So your code on analyzing what globals are referenced where might come in >> handy while analyzing if passes are easily parallelized. > >I think the "solution" to the garbage collecting issue is to simply keep >that serialized.=A0 It's currently done on-demand anyway at certain >safe collection points so the work scheduler can simply hold off >scheduling more work when the collector would want to run, waiting for >running jobs to complete. > >Richard. > >>>> The GC's allocator is used almost everywhere, and is probably not >>>> thread-safe yet. >>>Yes.=A0 There's also global tree modification like chaining new >>>pointer types into TYPE_POINTER_TO and friends so some >>>helpers in tree.c need to be guarded as well. >>>> FWIW I gave a talk at Cauldron 2013 about global state in GCC.=A0 Bewa= re: >>>> it's five years out-of-date, but maybe is still relevant in places? >>>>=A0 =A0https://dmalcolm.fedorapeople.org/gcc/global-state/ >>>>=A0 =A0https://gcc.gnu.org/ml/gcc/2013-05/msg00015.html >>>> (I tackled this for libgccjit by instead introducing a mutex, a "big >>>> compiler lock", jit_mutex in gcc/jit/jit-playback.c, held by whichever >>>> thread is calling into the rest of the compiler sources). >>>> >>>> Hope this is helpful >>>> Dave >>>> >>>>=A0 [0] https://docs.google.com/document/d/1YkKYI3J-pKFfHLdxljWVoJ89l2oVGibxslI= W3ikluz8/edit?usp=3Dsharing