Hi, I'm Anuj M, an undergraduate student interested in participating in GSoC 2024 with GCC. I would like to work on the project improving the DO CONCURRENT construct in the GFortran compiler.The current implementation in GFortran has limitations in handling locality clauses, supporting reduction operations, and parallelization strategies for DO CONCURRENT loops. So the proposal aims to address these limitations: 1. Implementing locality clauses and ensuring correct handling of data dependencies. 2. Supporting reduction operations in DO CONCURRENT loops. 3. Developing parallelization strategies, including OpenMP-based parallelization and OpenMP offloading. I have added a detailed project proposal outlining the implementation approach, timeline, my relevant background, and experience. I would greatly appreciate feedback or suggestions from the GCC community regarding this project proposal. Best regards, Anuj M ## GCC, the GNU Compiler Collection - Google Summer Of Code 24 Proposal - Anuj Mohite Project: Fortran - DO CONCURRENT Abstract: The `DO CONCURRENT` construct, introduced in the Fortran 2018 standard, provides a mechanism to express parallelism in Fortran programs. However, fully leveraging its potential requires a systematic and comprehensive implementation within Fortran compilers. This proposal outlines a robust solution for implementing `DO CONCURRENT` support, encompassing parsing and handling of locality clauses, enabling reduction operations, and developing parallelization strategies utilising OpenMP. To ensure efficient parallel execution, performance optimization techniques will be employed. By facilitating efficient parallelization of `DO CONCURRENT` loops, this project aims to contribute to Fortran's continued performance in high-performance computing domains, further enhancing its capabilities in this crucial area. Current State of Feature: At present, the support for the `DO CONCURRENT` construct in the GFortran compiler is limited. The existing implementation only partially handles the locality clauses introduced in the Fortran 2018 standard, and it lacks support for reduction operations and parallelization strategies. As a result, the performance gains achievable through the `DO CONCURRENT` construct are not fully realised. The current implementation in GFortran involves a basic parser for the `DO CONCURRENT` construct and its locality clauses. However, the semantic analysis and code generation phases are incomplete, leading to incorrect handling of data dependencies and potential race conditions. Additionally, the compiler does not support reduction operations or any parallelization strategies for `DO CONCURRENT` loops, effectively executing them in a serial manner. Other Fortran compilers, such as those from NVIDIA's nvfortran and Intel's ifort, have implemented varying levels of support for `DO CONCURRENT`. However, their implementations often have limitations or restrictions, and their performance can vary depending on the specific workload and hardware architecture. Furthermore, as the Fortran language continues to evolve, with the upcoming Fortran 202x standard introducing additional features and enhancements related to the `DO CONCURRENT` construct, it is crucial for compilers to stay up-to-date and provide comprehensive support for these language features. Project Goals The primary goals of this project are: 1. Implement Locality Clauses: * Extend the GFortran compiler to support locality clauses specified in the Fortran 2018 standard for the `DO CONCURRENT` construct. * Include parsing, semantic analysis, and code generation phases to handle specified data dependencies correctly. * Modify the compiler's parser to recognize new syntax for `DO CONCURRENT` loops and locality clauses, constructing an accurate AST. * Enhance semantic analysis phase to perform data dependency analysis, loop-carried dependency analysis, and alias analysis. * Resolve data dependencies and identify potential parallelization opportunities. 2. Support Reduction Operations: * add support for reduction operations in the `DO CONCURRENT` construct, as introduced in the upcoming Fortran 202x standard. * Involve parsing reduction clauses, semantic analysis for correctness, and generating optimized code for parallel reduction operations. * Extend the compiler's parser to recognize new syntax for reduction clauses, constructing an accurate AST. * Enhance semantic analysis phase to analyze reduction clauses and loop body, identifying potential dependencies and ensuring correctness of reduction operation. * Employ techniques like data dependency analysis and alias analysis to accurately identify variables involved in reduction operation and ensure they are not modified outside reduction context. 3. Parallelize DO CONCURRENT Loops: * Develop and integrate parallelization strategies for `DO CONCURRENT` loops into the GFortran compiler. * Include OpenMP-based parallelization and OpenMP offloading. OpenMP-based Parallelization: * Leverage OpenMP API to enable thread-based parallelization of `DO CONCURRENT` loops on shared-memory systems. * Generate code to create OpenMP parallel regions around `DO CONCURRENT` loop, distribute iterations across threads using work-sharing constructs. * Handle synchronization and reduction operations using OpenMP's reduction clauses or atomic operations. OpenMP Offloading: * Extend OpenMP-based parallelization to support offloading `DO CONCURRENT` loops to accelerator devices like GPUs, using OpenMP target construct. * Generate code to detect and initialize accelerator devices, transfer data between host and device. * Generate compute kernels optimized for accelerator architecture, handle synchronization and result collection. Implementation: The proposed implementation involves modifying the GFortran compiler's parser, semantic analyzer, and code generator to handle the `DO CONCURRENT` construct and its associated clauses. The implementation is divided into several phases: 1. Parsing and AST Construction: Extend the parser to recognize the new syntax for `DO CONCURRENT` loops, locality clauses, and reduction clauses, constructing an abstract syntax tree (AST) that accurately represents these constructs. This phase will involve modifying the Fortran grammar rules and implementing the necessary parsing actions to correctly parse the `DO CONCURRENT` construct and its associated clauses. The parser will need to handle various syntax variations, such as the presence or absence of locality clauses, reduction clauses, or both. 2. Semantic Analysis and Dependency Resolution: Implement semantic analysis techniques, such as data dependency analysis, loop-carried dependency analysis, alias analysis, polyhedral analysis, and array data-flow analysis, to resolve data dependencies and identify potential parallelization opportunities accurately. The semantic analysis phase will involve analyzing the AST constructed during the parsing phase to identify data dependencies and potential parallelization opportunities. This will involve techniques such as data dependency analysis, loop-carried dependency analysis, alias analysis, polyhedral analysis, and array data-flow analysis to provide more accurate dependency information and enable more aggressive optimizations. 3. Code Generation and Transformation: Generate optimized code for parallel execution of `DO CONCURRENT` loops, respecting the specified locality clauses and reduction operations. This may involve techniques such as loop distribution, loop fission, loop fusion, loop blocking, loop unrolling, software pipelining, and the use of synchronization primitives. The code generation phase will be responsible for generating optimized code for parallel execution of `DO CONCURRENT` loops, taking into account the information gathered during the semantic analysis phase and the specified locality clauses and reduction operations. This may involve techniques such as loop distribution, loop fission, loop fusion, loop blocking, loop unrolling, software pipelining, and the use of synchronization primitives to ensure efficient parallel execution on modern hardware architectures. 4. Parallelization Strategies: Implement parallelization strategies, such as OpenMP-based parallelization, OpenMP offloading. These strategies will involve generating the necessary code for parallel execution, load balancing, and synchronization. * OpenMP-based Parallelization: The OpenMP-based parallelization strategy will leverage the widely-used OpenMP API to enable thread-based parallelization of `DO CONCURRENT` loops on shared-memory systems. This will involve generating code to create OpenMP parallel regions around the `DO CONCURRENT` loop, distributing the iterations across available threads using work-sharing constructs such as `omp parallel do` or `omp parallel loop`. The implementation will also handle synchronization and reduction operations using OpenMP's reduction clauses or atomic operations. * OpenMP Offloading: The OpenMP offloading strategy will extend the OpenMP-based parallelization to support offloading `DO CONCURRENT` loops to accelerator devices, such as GPUs, using the OpenMP target construct. This will involve generating code to detect and initialize accelerator devices, transfer necessary data between the host and the device, generate compute kernels optimized for the accelerator architecture, and handle synchronization and result collection. Timeline of the Project: Adding Patches & Understanding Code (April 3 - April 30) * Contribute minor patches and bug fixes to gain deeper codebase understanding. * Study the code organisation, data structures, and compilation phases related to DO CONCURRENT. Community Bonding Period (May 1 - May 26) * Familiarize myself with the GFortran codebase, Fortran language standards, and existing implementations of `DO CONCURRENT` in other compilers. * Discuss project goals and implementation details with the mentor, clarifying doubts or concerns. * Set up the development environment and ensure all necessary tools and dependencies are in place. Week 1-2: Parsing and AST Construction (May 27 - June 9) * Extend the GFortran compiler's parser to recognize the new syntax for `DO CONCURRENT` loops, locality clauses, and reduction clauses. * Modify the grammar rules and implement parsing actions to correctly parse these constructs. * Construct an AST that accurately represents the `DO CONCURRENT` construct and its associated clauses. Week 3-4: Semantic Analysis and Dependency Resolution (June 10 - June 23) * Implement semantic analysis techniques like data dependency analysis, loop-carried dependency analysis, and alias analysis. * Analyze the AST to identify data dependencies and potential parallelization opportunities. * Resolve data dependencies and ensure the correctness of the `DO CONCURRENT` loop execution. Week 5-6: Code Generation and Transformation (June 24 - July 7) * Generate optimized code for parallel execution of `DO CONCURRENT` loops, respecting locality clauses and reduction operations. * Implement techniques such as loop distribution, loop fission, loop fusion, and the use of synchronization primitives. Week 7-10: OpenMP-based Parallelization and OpenMP Offloading (July 8 - August 4) * Implement the OpenMP-based parallelization strategy for `DO CONCURRENT` loops on shared-memory systems. * Generate code to create OpenMP parallel regions, distribute iterations across threads, and handle synchronization and reduction operations. * Implement the OpenMP offloading strategy for offloading `DO CONCURRENT` loops to accelerator devices like GPUs. Week 11: Performance Optimization (August 5 - August 12) * Implement techniques to optimize the performance of parallelized `DO CONCURRENT` loops, like loop tiling, data prefetching, and minimizing synchronization overhead Week 12: Testing, Benchmarking, and Documentation (August 13 - August 19 ) * Generate and finalize the comprehensive test suite to validate the correctness of the proposed implementation, covering various use cases and edge scenarios. * Document the project, including implementation details, performance results, and any relevant findings or limitations. About Me: * Name - Anuj Mohite * University - College of Engineering Pune Technological University * Personal Email - anujmohite001@gmail.com * University Email - mohitear21.comp@coeptech.ac.in * GitHub username: https://www.github.com/anujrmohite * Time Zone - IST (GMT + 05:30) Time zone in India * Country & City: Pune, India * Prefered Language for communication: English Academic Background: * Pursuing a Bachelor's degree in Computer Science and Engineering from the College of Engineering Pune, Technological University. * Journey in programming began during the first year of high school Diploma in 2018, self-taught skills in C/C++ for Embedded Systems programming. Current Studies and Work: * Working as a Generalist Engineering Intern at Syrma SGS, contributing to Electronic hardware and software product development for Embedded Systems, with expected work hours of 16 - 20 per week. * Currently responsible for developing a custom Linux-based distribution system for Automotive Applications. Compiler-related Coursework: * Taken Compiler Construction theory and laboratory courses as part of the college curriculum. Completed assignments (GitHub link: click here). * Learned about different phases of compilation, various optimization techniques, etc. (Course syllabus Github Link: click here). Future Aspirations: * Wish to work with GCC this summer as a GSoC student, committing around 7 - 8 hours/Day and around 40 - 50 hours/week. * Believe in possessing the necessary skills to undertake this project. * Hope to make significant contributions to GCC this summer and be a part of GCC in the future. My experience with GCC: I'm part of the Free Software Users Group (CoFSUG) at my college, COEP. We're a bunch of students who are really into exploring the whole Free and Open Source Software (FOSS). We've been digging into how UNIX, GNU, and eventually GNU/Linux came to be, reading their journey from the early days. Because of this newfound interest, I got really into the GCC project and how it's always evolving. I started reaching out to the GCC community, like Martin and Jerry, to participate in Summer of Code. I also checked out the Insights On GFortran Mattermost space, which helped me learn how to build, test, and debug the GCC code. Now, I'm interested in implementing the `DO CONCURRENT` feature in GFortran. I'm super dedicated to work on it. And the awesome discussions happening on Bugzilla/ GCC mailing lists are adding more knowledge to me regarding overall development, and I'm happy and enthusiastic to be a part of it. Post GSOC: My genuine interest in compiler development drives me to actively contribute to GCC. I will stay updated with GCC's advancements and contribute to its evolution. Furthermore, I will be available for any future enhancements or extensions related to this project. References: [1] Can Fortran's 'do concurrent' replace directives for accelerated computing? [2] https://arxiv.org/catchup?smonth=10&group=grp_&sday=21&num=50&archive=cs&method=without&syear=2021 . [3] OpenMP Architecture Review Board. (2018). OpenMP Application Programming Interface Version 5.0. [4] OpenACC-Standard.org. (2015). The OpenACC Application Programming Interface Version 2.5. [5] Mellor-Crummey, J., & Scott, M. L. (1991). Algorithms for scalable synchronization on shared-memory multiprocessors. [6] Satish, N., Harris, M., & Garland, M. (2009). Designing efficient sorting algorithms for manycore GPUs. [7] Stratton, J. A., Rodrigues, C., Sung, I. J., Obeid, N., Chang, L. W., Anssari, N., ... & Hwu, W. M. (2012). Parboil: A revised benchmark suite for scientific and commercial throughput computing. Center for Reliable and High-Performance Computing, 127. [8] Deville, N., Hammer, M., KRAFTIS, J., O'KEEFE, M., Chapman, B., & Witting, K. (2022). OpenMP Technical Report 9 on OpenMP and Accelerators. OpenMP Architecture Review Board. [9] DO CONCURRENT isn’t necessarily concurrent