From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out1.suse.de (smtp-out1.suse.de [IPv6:2a07:de40:b251:101:10:150:64:1]) by sourceware.org (Postfix) with ESMTPS id 6714D3858D1E for ; Sat, 30 Mar 2024 20:40:34 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 6714D3858D1E Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.cz ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 6714D3858D1E Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a07:de40:b251:101:10:150:64:1 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1711831237; cv=none; b=Ld/ZAo98gd8sZc+Ys9oRb/bKQAEG4yFjWtuUWuI+aD6Et03/YdIqVTKUsa9H6ZOR4pdnq4ZIPkxBRrdurVlpNjcWA1Awr0ZakUuPEqxEZa4y8Ebyg6bLi6FFUqiwF37j0ESgN72UHkXexBogP/sjc8m6B6/qhCyWJewgiMl7+FA= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1711831237; c=relaxed/simple; bh=eG1c1gx689oD2gKXPMB6FMAU1J8gqQ6XTSewA0LPbEQ=; h=DKIM-Signature:DKIM-Signature:From:To:Subject:Date:Message-ID: MIME-Version; b=sEZI201hsXfIWzBY4s/aRA1VbksOtZVdOE/y0bKVLT4IIy5avmfDkQdun26AQ7yH/ksZ0QjM7rL9kc9mr9yO5jfNpXpEE6I0Vo/3yOIwEAuVXFVkpQri/hYtpPee0EPaTHbbDtLEtSvBj3/zYPDS8KIdU2buqeoyOoCwHjo73WU= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from imap2.dmz-prg2.suse.org (imap2.dmz-prg2.suse.org [10.150.64.98]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 2D4B637ED7; Sat, 30 Mar 2024 20:40:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1711831233; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=DhFOz7ElZ7QcnfMQ5TrlKqRlCPRxiwU4OWUk8OWDzM4=; b=k87YEg+OHgQjcapV0rlRMdOoR51tKiJj3eKVZQKVapC9mepkNYiDoUh+7UKMhwvsD4O5Ty ZABGm4cNutpzFrimLtTnTqUMXMnLWIz82Pum4vJWQNHGzzOAqYE0YpSvHiyqoNAY3YxvR8 beEIGJlWA4aMKhE+uE8KddU2tAM9Q3M= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1711831233; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=DhFOz7ElZ7QcnfMQ5TrlKqRlCPRxiwU4OWUk8OWDzM4=; b=V7OtxZ/KMsmo5kLGtgHpA1gTLF18uvvV/v9CDx1I2gUhh9WFSKST/kk24TSpsXiOdNI9pn 6lBdp1H78sSaE0BA== Authentication-Results: smtp-out1.suse.de; none Received: from imap2.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap2.dmz-prg2.suse.org (Postfix) with ESMTPS id 1E00513A7E; Sat, 30 Mar 2024 20:40:33 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap2.dmz-prg2.suse.org with ESMTPSA id M6MAB8F4CGZ3JwAAn2gu4w (envelope-from ); Sat, 30 Mar 2024 20:40:33 +0000 From: Martin Jambor To: Anuj Mohite , gcc@gcc.gnu.org Cc: Tobias Burnus Subject: Re: GSoC 2024 [Fortran - DO CONCURRENT] Seeking feedback/suggestions for project proposal In-Reply-To: References: User-Agent: Notmuch/0.38.2 (https://notmuchmail.org) Emacs/29.3 (x86_64-suse-linux-gnu) Date: Sat, 30 Mar 2024 21:40:22 +0100 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -4.30 X-Spamd-Result: default: False [-4.30 / 50.00]; ARC_NA(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; FREEMAIL_ENVRCPT(0.00)[gmail.com]; TO_MATCH_ENVRCPT_ALL(0.00)[]; MIME_GOOD(-0.10)[text/plain]; NEURAL_HAM_LONG(-1.00)[-1.000]; TO_DN_SOME(0.00)[]; RCVD_COUNT_THREE(0.00)[3]; MID_RHS_MATCH_FROMTLD(0.00)[]; DKIM_SIGNED(0.00)[suse.cz:s=susede2_rsa,suse.cz:s=susede2_ed25519]; NEURAL_HAM_SHORT(-0.20)[-0.999]; FREEMAIL_TO(0.00)[gmail.com,gcc.gnu.org]; FUZZY_BLOCKED(0.00)[rspamd.com]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; RCVD_TLS_ALL(0.00)[]; BAYES_HAM(-3.00)[100.00%] X-Spam-Level: X-Spam-Status: No, score=-5.5 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hello Anuj, On Thu, Mar 28 2024, Anuj Mohite wrote: > Hi, > I'm Anuj M, an undergraduate student interested in participating in GSoC > 2024 with GCC. I would like to work on the project improving the DO > CONCURRENT construct in the GFortran compiler.The current implementation = in > GFortran has limitations in handling locality clauses, supporting reducti= on > operations, and parallelization strategies for DO CONCURRENT loops. So the > proposal aims to address these limitations: timing of the GSoC contributor application deadline (on the upcoming Tuesday) is a bit unfortunate because of Easter, many involved mentors have a long weekend (public holiday on Friday or Monday or, like me, both). So please even if you do not receive any more feedback, make sure to apply - and don't leave it until the last day. IIUC a proposal can be always updated later. I admit that I managed to have only a very quick look at your proposal but it all looked good to me. Good luck! Martin > > 1. Implementing locality clauses and ensuring correct handling of data > dependencies. > 2. Supporting reduction operations in DO CONCURRENT loops. > 3. Developing parallelization strategies, including OpenMP-based > parallelization and OpenMP offloading. > > I have added a detailed project proposal outlining the implementation > approach, timeline, my relevant background, and experience. > > I would greatly appreciate feedback or suggestions from the GCC community > regarding this project proposal. > > Best regards, > Anuj M > > =EF=BB=BF## GCC, the GNU Compiler Collection - Google Summer Of Code 24 P= roposal - > Anuj Mohite > > Project: Fortran - DO CONCURRENT > > Abstract: > > The `DO CONCURRENT` construct, introduced in the Fortran 2018 standard, > provides a mechanism to express parallelism in Fortran programs. However, > fully leveraging its potential requires a systematic and comprehensive > implementation within Fortran compilers. This proposal outlines a robust > solution for implementing `DO CONCURRENT` support, encompassing parsing a= nd > handling of locality clauses, enabling reduction operations, and developi= ng > parallelization strategies utilising OpenMP. > To ensure efficient parallel execution, performance optimization techniqu= es > will be employed. By facilitating efficient parallelization of `DO > CONCURRENT` loops, this project aims to contribute to Fortran's continued > performance in high-performance computing domains, further enhancing its > capabilities in this crucial area. > > Current State of Feature: > > At present, the support for the `DO CONCURRENT` construct in the GFortran > compiler is limited. The existing implementation only partially handles t= he > locality clauses introduced in the Fortran 2018 standard, and it lacks > support for reduction operations and parallelization strategies. As a > result, the performance gains achievable through the `DO CONCURRENT` > construct are not fully realised. > > The current implementation in GFortran involves a basic parser for the `DO > CONCURRENT` construct and its locality clauses. However, the semantic > analysis and code generation phases are incomplete, leading to incorrect > handling of data dependencies and potential race conditions. Additionally, > the compiler does not support reduction operations or any parallelization > strategies for `DO CONCURRENT` loops, effectively executing them in a > serial manner. > > Other Fortran compilers, such as those from NVIDIA's nvfortran and Intel's > ifort, have implemented varying levels of support for `DO CONCURRENT`. > However, their implementations often have limitations or restrictions, and > their performance can vary depending on the specific workload and hardware > architecture. > > Furthermore, as the Fortran language continues to evolve, with the upcomi= ng > Fortran 202x standard introducing additional features and enhancements > related to the `DO CONCURRENT` construct, it is crucial for compilers to > stay up-to-date and provide comprehensive support for these language > features. > Project Goals > > The primary goals of this project are: > > 1. Implement Locality Clauses: > > * Extend the GFortran compiler to support locality clauses specified in t= he > Fortran 2018 standard for the `DO CONCURRENT` construct. > * Include parsing, semantic analysis, and code generation phases to handle > specified data dependencies correctly. > * Modify the compiler's parser to recognize new syntax for `DO CONCURRENT` > loops and locality clauses, constructing an accurate AST. > * Enhance semantic analysis phase to perform data dependency analysis, > loop-carried dependency analysis, and alias analysis. > * Resolve data dependencies and identify potential parallelization > opportunities. > > 2. Support Reduction Operations: > > * add support for reduction operations in the `DO CONCURRENT` construct, = as > introduced in the upcoming Fortran 202x standard. > * Involve parsing reduction clauses, semantic analysis for correctness, a= nd > generating optimized code for parallel reduction operations. > * Extend the compiler's parser to recognize new syntax for reduction > clauses, constructing an accurate AST. > * Enhance semantic analysis phase to analyze reduction clauses and loop > body, identifying potential dependencies and ensuring correctness of > reduction operation. > * Employ techniques like data dependency analysis and alias analysis to > accurately identify variables involved in reduction operation and ensure > they are not modified outside reduction context. > > 3. Parallelize DO CONCURRENT Loops: > > * Develop and integrate parallelization strategies for `DO CONCURRENT` > loops into the GFortran compiler. > * Include OpenMP-based parallelization and OpenMP offloading. > > OpenMP-based Parallelization: > > * Leverage OpenMP API to enable thread-based parallelization of `DO > CONCURRENT` loops on shared-memory systems. > * Generate code to create OpenMP parallel regions around `DO CONCURRENT` > loop, distribute iterations across threads using work-sharing constructs. > * Handle synchronization and reduction operations using OpenMP's reduction > clauses or atomic operations. > > OpenMP Offloading: > > * Extend OpenMP-based parallelization to support offloading `DO CONCURREN= T` > loops to accelerator devices like GPUs, using OpenMP target construct. > * Generate code to detect and initialize accelerator devices, transfer da= ta > between host and device. > * Generate compute kernels optimized for accelerator architecture, handle > synchronization and result collection. > > Implementation: > > The proposed implementation involves modifying the GFortran compiler's > parser, semantic analyzer, and code generator to handle the `DO CONCURREN= T` > construct and its associated clauses. The implementation is divided into > several phases: > > 1. Parsing and AST Construction: Extend the parser to recognize the new > syntax for `DO CONCURRENT` loops, locality clauses, and reduction clauses, > constructing an abstract syntax tree (AST) that accurately represents the= se > constructs. > This phase will involve modifying the Fortran grammar rules and > implementing the necessary parsing actions to correctly parse the `DO > CONCURRENT` construct and its associated clauses. The parser will need to > handle various syntax variations, such as the presence or absence of > locality clauses, reduction clauses, or both. > > 2. Semantic Analysis and Dependency Resolution: Implement semantic analys= is > techniques, such as data dependency analysis, loop-carried dependency > analysis, alias analysis, polyhedral analysis, and array data-flow > analysis, to resolve data dependencies and identify potential > parallelization opportunities accurately. > The semantic analysis phase will involve analyzing the AST constructed > during the parsing phase to identify data dependencies and potential > parallelization opportunities. This will involve techniques such as data > dependency analysis, loop-carried dependency analysis, alias analysis, > polyhedral analysis, and array data-flow analysis to provide more accurate > dependency information and enable more aggressive optimizations. > > 3. Code Generation and Transformation: Generate optimized code for parall= el > execution of `DO CONCURRENT` loops, respecting the specified locality > clauses and reduction operations. This may involve techniques such as loop > distribution, loop fission, loop fusion, loop blocking, loop unrolling, > software pipelining, and the use of synchronization primitives. > The code generation phase will be responsible for generating optimized > code for parallel execution of `DO CONCURRENT` loops, taking into account > the information gathered during the semantic analysis phase and the > specified locality clauses and reduction operations. This may involve > techniques such as loop distribution, loop fission, loop fusion, loop > blocking, loop unrolling, software pipelining, and the use of > synchronization primitives to ensure efficient parallel execution on mode= rn > hardware architectures. > > 4. Parallelization Strategies: Implement parallelization strategies, such > as OpenMP-based parallelization, OpenMP offloading. These strategies will > involve generating the necessary code for parallel execution, load > balancing, and synchronization. > > * OpenMP-based Parallelization: > > The OpenMP-based parallelization strategy will leverage the widely-used > OpenMP API to enable thread-based parallelization of `DO CONCURRENT` loops > on shared-memory systems. This will involve generating code to create > OpenMP parallel regions around the `DO CONCURRENT` loop, distributing the > iterations across available threads using work-sharing constructs such as > `omp parallel do` or `omp parallel loop`. The implementation will also > handle synchronization and reduction operations using OpenMP's reduction > clauses or atomic operations. > > * OpenMP Offloading: > > The OpenMP offloading strategy will extend the OpenMP-based parallelizati= on > to support offloading `DO CONCURRENT` loops to accelerator devices, such = as > GPUs, using the OpenMP target construct. This will involve generating code > to detect and initialize accelerator devices, transfer necessary data > between the host and the device, generate compute kernels optimized for t= he > accelerator architecture, and handle synchronization and result collectio= n. > > Timeline of the Project: > > Adding Patches & Understanding Code (April 3 - April 30) > > * Contribute minor patches and bug fixes to gain deeper codebase > understanding. > * Study the code organisation, data structures, and compilation phases > related to DO CONCURRENT. > > Community Bonding Period (May 1 - May 26) > > * Familiarize myself with the GFortran codebase, Fortran language > standards, and existing implementations of `DO CONCURRENT` in other > compilers. > * Discuss project goals and implementation details with the mentor, > clarifying doubts or concerns. > * Set up the development environment and ensure all necessary tools and > dependencies are in place. > > Week 1-2: Parsing and AST Construction (May 27 - June 9) > > * Extend the GFortran compiler's parser to recognize the new syntax for `= DO > CONCURRENT` loops, locality clauses, and reduction clauses. > * Modify the grammar rules and implement parsing actions to correctly par= se > these constructs. > * Construct an AST that accurately represents the `DO CONCURRENT` constru= ct > and its associated clauses. > > Week 3-4: Semantic Analysis and Dependency Resolution (June 10 - June 23) > > * Implement semantic analysis techniques like data dependency analysis, > loop-carried dependency analysis, and alias analysis. > * Analyze the AST to identify data dependencies and potential > parallelization opportunities. > * Resolve data dependencies and ensure the correctness of the `DO > CONCURRENT` loop execution. > > Week 5-6: Code Generation and Transformation (June 24 - July 7) > > * Generate optimized code for parallel execution of `DO CONCURRENT` loops, > respecting locality clauses and reduction operations. > * Implement techniques such as loop distribution, loop fission, loop > fusion, and the use of synchronization primitives. > > Week 7-10: OpenMP-based Parallelization and OpenMP Offloading (July 8 - > August 4) > > * Implement the OpenMP-based parallelization strategy for `DO CONCURRENT` > loops on shared-memory systems. > * Generate code to create OpenMP parallel regions, distribute iterations > across threads, and handle synchronization and reduction operations. > * Implement the OpenMP offloading strategy for offloading `DO CONCURRENT` > loops to accelerator devices like GPUs. > > Week 11: Performance Optimization (August 5 - August 12) > > * Implement techniques to optimize the performance of parallelized `DO > CONCURRENT` loops, like loop tiling, data prefetching, and minimizing > synchronization overhead > > Week 12: Testing, Benchmarking, and Documentation (August 13 - August 19 ) > > * Generate and finalize the comprehensive test suite to validate the > correctness of the proposed implementation, covering various use cases and > edge scenarios. > * Document the project, including implementation details, performance > results, and any relevant findings or limitations. > > About Me: > > * Name - Anuj Mohite > * University - College of Engineering Pune Technological University > * Personal Email - anujmohite001@gmail.com > * University Email - mohitear21.comp@coeptech.ac.in > * GitHub username: https://www.github.com/anujrmohite > * Time Zone - IST (GMT + 05:30) Time zone in India > * Country & City: Pune, India > * Prefered Language for communication: English > > > Academic Background: > * Pursuing a Bachelor's degree in Computer Science and Engineering from t= he > College of Engineering Pune, Technological University. > * Journey in programming began during the first year of high school Diplo= ma > in 2018, self-taught skills in C/C++ for Embedded Systems programming. > > Current Studies and Work: > * Working as a Generalist Engineering Intern at Syrma SGS, contributing to > Electronic hardware and software product development for Embedded Systems, > with expected work hours of 16 - 20 per week. > * Currently responsible for developing a custom Linux-based distribution > system for Automotive Applications. > > Compiler-related Coursework: > * Taken Compiler Construction theory and laboratory courses as part of the > college curriculum. Completed assignments (GitHub link: click here). > * Learned about different phases of compilation, various optimization > techniques, etc. (Course syllabus Github Link: click here). > > Future Aspirations: > * Wish to work with GCC this summer as a GSoC student, committing around 7 > - 8 hours/Day and around 40 - 50 hours/week. > * Believe in possessing the necessary skills to undertake this project. > * Hope to make significant contributions to GCC this summer and be a part > of GCC in the future. > > My experience with GCC: > > I'm part of the Free Software Users Group (CoFSUG) at my college, COEP. > We're a bunch of students who are really into exploring the whole Free and > Open Source Software (FOSS). We've been digging into how UNIX, GNU, and > eventually GNU/Linux came to be, reading their journey from the early day= s. > Because of this newfound interest, I got really into the GCC project and > how it's always evolving. I started reaching out to the GCC community, li= ke > Martin and Jerry, to participate in Summer of Code. I also checked out the > Insights On GFortran Mattermost space, which helped me learn how to build, > test, and debug the GCC code. > Now, I'm interested in implementing the `DO CONCURRENT` feature in > GFortran. I'm super dedicated to work on it. And the awesome discussions > happening on Bugzilla/ GCC mailing lists are adding more knowledge to me > regarding overall development, and I'm happy and enthusiastic to be a part > of it. > > Post GSOC: > > My genuine interest in compiler development drives me to actively > contribute to GCC. I will stay updated with GCC's advancements and > contribute to its evolution. Furthermore, I will be available for any > future enhancements or extensions related to this project. > > References: > > [1] Can Fortran's 'do concurrent' replace directives for accelerated > computing? > [2] > https://arxiv.org/catchup?smonth=3D10&group=3Dgrp_&sday=3D21&num=3D50&arc= hive=3Dcs&method=3Dwithout&syear=3D2021 > . > [3] OpenMP Architecture Review Board. (2018). OpenMP Application > Programming Interface Version 5.0. > [4] OpenACC-Standard.org. (2015). The OpenACC Application Programming > Interface Version 2.5. > [5] Mellor-Crummey, J., & Scott, M. L. (1991). Algorithms for scalable > synchronization on shared-memory multiprocessors. > [6] Satish, N., Harris, M., & Garland, M. (2009). Designing efficient > sorting algorithms for manycore GPUs. > [7] Stratton, J. A., Rodrigues, C., Sung, I. J., Obeid, N., Chang, L. W., > Anssari, N., ... & Hwu, W. M. (2012). Parboil: A revised benchmark suite > for scientific and commercial throughput computing. Center for Reliable a= nd > High-Performance Computing, 127. > [8] Deville, N., Hammer, M., KRAFTIS, J., O'KEEFE, M., Chapman, B., & > Witting, K. (2022). OpenMP Technical Report 9 on OpenMP and Accelerators. > OpenMP Architecture Review Board. > [9] DO CONCURRENT isn=E2=80=99t necessarily concurrent