GSoC 2024 [Fortran - DO CONCURRENT] Seeking feedback/suggestions for project proposal

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

From: Anuj Mohite <anujmohite001@gmail.com>
To: gcc@gcc.gnu.org
Cc: Tobias Burnus <tburnus@baylibre.com>, Martin Jambor <mjambor@suse.cz>
Subject: GSoC 2024 [Fortran - DO CONCURRENT] Seeking feedback/suggestions for project proposal
Date: Thu, 28 Mar 2024 00:39:08 +0530	[thread overview]
Message-ID: <CAMw23nj6ZW67L2Ju-pdSyuiFW52e-CH=fOn+C6C8qpZS8d+Szg@mail.gmail.com> (raw)

[-- Attachment #1: Type: text/plain, Size: 16599 bytes --]

Hi,
I'm Anuj M, an undergraduate student interested in participating in GSoC
2024 with GCC. I would like to work on the project improving the DO
CONCURRENT construct in the GFortran compiler.The current implementation in
GFortran has limitations in handling locality clauses, supporting reduction
operations, and parallelization strategies for DO CONCURRENT loops. So the
proposal aims to address these limitations:

   1. Implementing locality clauses and ensuring correct handling of data
   dependencies.
   2. Supporting reduction operations in DO CONCURRENT loops.
   3. Developing parallelization strategies, including OpenMP-based
   parallelization and OpenMP offloading.

I have added a detailed project proposal outlining the implementation
approach, timeline, my relevant background, and experience.

I would greatly appreciate feedback or suggestions from the GCC community
regarding this project proposal.

Best regards,
Anuj M

## GCC, the GNU Compiler Collection - Google Summer Of Code 24 Proposal -
Anuj Mohite

Project: Fortran - DO CONCURRENT

Abstract:

The `DO CONCURRENT` construct, introduced in the Fortran 2018 standard,
provides a mechanism to express parallelism in Fortran programs. However,
fully leveraging its potential requires a systematic and comprehensive
implementation within Fortran compilers. This proposal outlines a robust
solution for implementing `DO CONCURRENT` support, encompassing parsing and
handling of locality clauses, enabling reduction operations, and developing
parallelization strategies utilising OpenMP.
To ensure efficient parallel execution, performance optimization techniques
will be employed. By facilitating efficient parallelization of `DO
CONCURRENT` loops, this project aims to contribute to Fortran's continued
performance in high-performance computing domains, further enhancing its
capabilities in this crucial area.

Current State of Feature:

At present, the support for the `DO CONCURRENT` construct in the GFortran
compiler is limited. The existing implementation only partially handles the
locality clauses introduced in the Fortran 2018 standard, and it lacks
support for reduction operations and parallelization strategies. As a
result, the performance gains achievable through the `DO CONCURRENT`
construct are not fully realised.

The current implementation in GFortran involves a basic parser for the `DO
CONCURRENT` construct and its locality clauses. However, the semantic
analysis and code generation phases are incomplete, leading to incorrect
handling of data dependencies and potential race conditions. Additionally,
the compiler does not support reduction operations or any parallelization
strategies for `DO CONCURRENT` loops, effectively executing them in a
serial manner.

Other Fortran compilers, such as those from NVIDIA's nvfortran and Intel's
ifort, have implemented varying levels of support for `DO CONCURRENT`.
However, their implementations often have limitations or restrictions, and
their performance can vary depending on the specific workload and hardware
architecture.

Furthermore, as the Fortran language continues to evolve, with the upcoming
Fortran 202x standard introducing additional features and enhancements
related to the `DO CONCURRENT` construct, it is crucial for compilers to
stay up-to-date and provide comprehensive support for these language
features.
Project Goals

The primary goals of this project are:

1. Implement Locality Clauses:

* Extend the GFortran compiler to support locality clauses specified in the
Fortran 2018 standard for the `DO CONCURRENT` construct.
* Include parsing, semantic analysis, and code generation phases to handle
specified data dependencies correctly.
* Modify the compiler's parser to recognize new syntax for `DO CONCURRENT`
loops and locality clauses, constructing an accurate AST.
* Enhance semantic analysis phase to perform data dependency analysis,
loop-carried dependency analysis, and alias analysis.
* Resolve data dependencies and identify potential parallelization
opportunities.

2. Support Reduction Operations:

* add support for reduction operations in the `DO CONCURRENT` construct, as
introduced in the upcoming Fortran 202x standard.
* Involve parsing reduction clauses, semantic analysis for correctness, and
generating optimized code for parallel reduction operations.
* Extend the compiler's parser to recognize new syntax for reduction
clauses, constructing an accurate AST.
* Enhance semantic analysis phase to analyze reduction clauses and loop
body, identifying potential dependencies and ensuring correctness of
reduction operation.
* Employ techniques like data dependency analysis and alias analysis to
accurately identify variables involved in reduction operation and ensure
they are not modified outside reduction context.

3. Parallelize DO CONCURRENT Loops:

* Develop and integrate parallelization strategies for `DO CONCURRENT`
loops into the GFortran compiler.
* Include OpenMP-based parallelization and OpenMP offloading.

OpenMP-based Parallelization:

* Leverage OpenMP API to enable thread-based parallelization of `DO
CONCURRENT` loops on shared-memory systems.
* Generate code to create OpenMP parallel regions around `DO CONCURRENT`
loop, distribute iterations across threads using work-sharing constructs.
* Handle synchronization and reduction operations using OpenMP's reduction
clauses or atomic operations.

OpenMP Offloading:

* Extend OpenMP-based parallelization to support offloading `DO CONCURRENT`
loops to accelerator devices like GPUs, using OpenMP target construct.
* Generate code to detect and initialize accelerator devices, transfer data
between host and device.
* Generate compute kernels optimized for accelerator architecture, handle
synchronization and result collection.

Implementation:

The proposed implementation involves modifying the GFortran compiler's
parser, semantic analyzer, and code generator to handle the `DO CONCURRENT`
construct and its associated clauses. The implementation is divided into
several phases:

1. Parsing and AST Construction: Extend the parser to recognize the new
syntax for `DO CONCURRENT` loops, locality clauses, and reduction clauses,
constructing an abstract syntax tree (AST) that accurately represents these
constructs.
  This phase will involve modifying the Fortran grammar rules and
implementing the necessary parsing actions to correctly parse the `DO
CONCURRENT` construct and its associated clauses. The parser will need to
handle various syntax variations, such as the presence or absence of
locality clauses, reduction clauses, or both.

2. Semantic Analysis and Dependency Resolution: Implement semantic analysis
techniques, such as data dependency analysis, loop-carried dependency
analysis, alias analysis, polyhedral analysis, and array data-flow
analysis, to resolve data dependencies and identify potential
parallelization opportunities accurately.
  The semantic analysis phase will involve analyzing the AST constructed
during the parsing phase to identify data dependencies and potential
parallelization opportunities. This will involve techniques such as data
dependency analysis, loop-carried dependency analysis, alias analysis,
polyhedral analysis, and array data-flow analysis to provide more accurate
dependency information and enable more aggressive optimizations.

3. Code Generation and Transformation: Generate optimized code for parallel
execution of `DO CONCURRENT` loops, respecting the specified locality
clauses and reduction operations. This may involve techniques such as loop
distribution, loop fission, loop fusion, loop blocking, loop unrolling,
software pipelining, and the use of synchronization primitives.
  The code generation phase will be responsible for generating optimized
code for parallel execution of `DO CONCURRENT` loops, taking into account
the information gathered during the semantic analysis phase and the
specified locality clauses and reduction operations. This may involve
techniques such as loop distribution, loop fission, loop fusion, loop
blocking, loop unrolling, software pipelining, and the use of
synchronization primitives to ensure efficient parallel execution on modern
hardware architectures.

4. Parallelization Strategies: Implement parallelization strategies, such
as OpenMP-based parallelization, OpenMP offloading. These strategies will
involve generating the necessary code for parallel execution, load
balancing, and synchronization.

*   OpenMP-based Parallelization:

The OpenMP-based parallelization strategy will leverage the widely-used
OpenMP API to enable thread-based parallelization of `DO CONCURRENT` loops
on shared-memory systems. This will involve generating code to create
OpenMP parallel regions around the `DO CONCURRENT` loop, distributing the
iterations across available threads using work-sharing constructs such as
`omp parallel do` or `omp parallel loop`. The implementation will also
handle synchronization and reduction operations using OpenMP's reduction
clauses or atomic operations.

*   OpenMP Offloading:

The OpenMP offloading strategy will extend the OpenMP-based parallelization
to support offloading `DO CONCURRENT` loops to accelerator devices, such as
GPUs, using the OpenMP target construct. This will involve generating code
to detect and initialize accelerator devices, transfer necessary data
between the host and the device, generate compute kernels optimized for the
accelerator architecture, and handle synchronization and result collection.

Timeline of the Project:

Adding Patches & Understanding Code (April 3 -  April 30)

* Contribute minor patches and bug fixes to gain deeper codebase
understanding.
* Study the code organisation, data structures, and compilation phases
related to DO CONCURRENT.

Community Bonding Period (May 1 -  May 26)

* Familiarize myself with the GFortran codebase, Fortran language
standards, and existing implementations of `DO CONCURRENT` in other
compilers.
* Discuss project goals and implementation details with the mentor,
clarifying doubts or concerns.
* Set up the development environment and ensure all necessary tools and
dependencies are in place.

Week 1-2: Parsing and AST Construction (May 27 - June 9)

* Extend the GFortran compiler's parser to recognize the new syntax for `DO
CONCURRENT` loops, locality clauses, and reduction clauses.
* Modify the grammar rules and implement parsing actions to correctly parse
these constructs.
* Construct an AST that accurately represents the `DO CONCURRENT` construct
and its associated clauses.

Week 3-4: Semantic Analysis and Dependency Resolution (June 10 - June 23)

* Implement semantic analysis techniques like data dependency analysis,
loop-carried dependency analysis, and alias analysis.
* Analyze the AST to identify data dependencies and potential
parallelization opportunities.
* Resolve data dependencies and ensure the correctness of the `DO
CONCURRENT` loop execution.

Week 5-6: Code Generation and Transformation (June 24 - July 7)

* Generate optimized code for parallel execution of `DO CONCURRENT` loops,
respecting locality clauses and reduction operations.
* Implement techniques such as loop distribution, loop fission, loop
fusion, and the use of synchronization primitives.

Week 7-10: OpenMP-based Parallelization and OpenMP Offloading (July 8 -
August 4)

* Implement the OpenMP-based parallelization strategy for `DO CONCURRENT`
loops on shared-memory systems.
* Generate code to create OpenMP parallel regions, distribute iterations
across threads, and handle synchronization and reduction operations.
* Implement the OpenMP offloading strategy for offloading `DO CONCURRENT`
loops to accelerator devices like GPUs.

Week 11: Performance Optimization (August 5 - August 12)

* Implement techniques to optimize the performance of parallelized `DO
CONCURRENT` loops, like loop tiling, data prefetching, and minimizing
synchronization overhead

Week 12: Testing, Benchmarking, and Documentation (August 13 - August 19 )

* Generate and finalize the comprehensive test suite to validate the
correctness of the proposed implementation, covering various use cases and
edge scenarios.
* Document the project, including implementation details, performance
results, and any relevant findings or limitations.

About Me:

* Name - Anuj Mohite
* University - College of Engineering Pune Technological University
* Personal Email -  anujmohite001@gmail.com
* University Email - mohitear21.comp@coeptech.ac.in
* GitHub username: https://www.github.com/anujrmohite
* Time Zone - IST (GMT + 05:30) Time zone in India
* Country & City: Pune, India
* Prefered Language for communication: English

Academic Background:
* Pursuing a Bachelor's degree in Computer Science and Engineering from the
College of Engineering Pune, Technological University.
* Journey in programming began during the first year of high school Diploma
in 2018, self-taught skills in C/C++ for Embedded Systems programming.

Current Studies and Work:
* Working as a Generalist Engineering Intern at Syrma SGS, contributing to
Electronic hardware and software product development for Embedded Systems,
with expected work hours of 16 - 20 per week.
* Currently responsible for developing a custom Linux-based distribution
system for Automotive Applications.

Compiler-related Coursework:
* Taken Compiler Construction theory and laboratory courses as part of the
college curriculum. Completed assignments (GitHub link: click here).
* Learned about different phases of compilation, various optimization
techniques, etc. (Course syllabus Github Link: click here).

Future Aspirations:
* Wish to work with GCC this summer as a GSoC student, committing around 7
- 8 hours/Day and around 40 - 50 hours/week.
* Believe in possessing the necessary skills to undertake this project.
* Hope to make significant contributions to GCC this summer and be a part
of GCC in the future.

My experience with GCC:

I'm part of the Free Software Users Group (CoFSUG) at my college, COEP.
We're a bunch of students who are really into exploring the whole Free and
Open Source Software (FOSS). We've been digging into how UNIX, GNU, and
eventually GNU/Linux came to be, reading their journey from the early days.
Because of this newfound interest, I got really into the GCC project and
how it's always evolving. I started reaching out to the GCC community, like
Martin and Jerry, to participate in Summer of Code. I also checked out the
Insights On GFortran Mattermost space, which helped me learn how to build,
test, and debug the GCC code.
Now, I'm interested in implementing the `DO CONCURRENT` feature in
GFortran. I'm super dedicated to work on it. And the awesome discussions
happening on Bugzilla/ GCC mailing lists are adding more knowledge to me
regarding overall development, and I'm happy and enthusiastic to be a part
of it.

Post GSOC:

My genuine interest in compiler development drives me to actively
contribute to GCC. I will stay updated with GCC's advancements and
contribute to its evolution. Furthermore, I will be available for any
future enhancements or extensions related to this project.

References:

[1] Can Fortran's 'do concurrent' replace directives for accelerated
computing?
[2]
https://arxiv.org/catchup?smonth=10&group=grp_&sday=21&num=50&archive=cs&method=without&syear=2021
.
[3] OpenMP Architecture Review Board. (2018). OpenMP Application
Programming Interface Version 5.0.
[4] OpenACC-Standard.org. (2015). The OpenACC Application Programming
Interface Version 2.5.
[5] Mellor-Crummey, J., & Scott, M. L. (1991). Algorithms for scalable
synchronization on shared-memory multiprocessors.
[6] Satish, N., Harris, M., & Garland, M. (2009). Designing efficient
sorting algorithms for manycore GPUs.
[7] Stratton, J. A., Rodrigues, C., Sung, I. J., Obeid, N., Chang, L. W.,
Anssari, N., ... & Hwu, W. M. (2012). Parboil: A revised benchmark suite
for scientific and commercial throughput computing. Center for Reliable and
High-Performance Computing, 127.
[8] Deville, N., Hammer, M., KRAFTIS, J., O'KEEFE, M., Chapman, B., &
Witting, K. (2022). OpenMP Technical Report 9 on OpenMP and Accelerators.
OpenMP Architecture Review Board.
[9] DO CONCURRENT isn’t necessarily concurrent

next             reply	other threads:[~2024-03-27 19:09 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-27 19:09 Anuj Mohite [this message]
2024-03-30 20:40 ` Martin Jambor

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAMw23nj6ZW67L2Ju-pdSyuiFW52e-CH=fOn+C6C8qpZS8d+Szg@mail.gmail.com' \
    --to=anujmohite001@gmail.com \
    --cc=gcc@gcc.gnu.org \
    --cc=mjambor@suse.cz \
    --cc=tburnus@baylibre.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).