public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* [GSoC'19, libgomp work-stealing] Task parallelism runtime
@ 2019-06-03 18:01 김규래
  2019-06-03 18:21 ` Jakub Jelinek
  0 siblings, 1 reply; 10+ messages in thread
From: 김규래 @ 2019-06-03 18:01 UTC (permalink / raw)
  To: gcc; +Cc: jakub

Hi,
I've been studying the libgomp task parallelism system.
I have a few questions.
First, Tracing the events shows that only the main thread calls GOMP_task.
How do the other worker threads enter the libgomp runtime?
I can't find the entry point of the worker threads from the event tracing and the assembly dump. 
Second, How is the task priority set?
 
Ray Kim

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [GSoC'19, libgomp work-stealing] Task parallelism runtime
  2019-06-03 18:01 [GSoC'19, libgomp work-stealing] Task parallelism runtime 김규래
@ 2019-06-03 18:21 ` Jakub Jelinek
  2019-06-05 18:25   ` 김규래
  0 siblings, 1 reply; 10+ messages in thread
From: Jakub Jelinek @ 2019-06-03 18:21 UTC (permalink / raw)
  To: 김규래; +Cc: gcc

On Tue, Jun 04, 2019 at 03:01:13AM +0900, 김규래 wrote:
> Hi,
> I've been studying the libgomp task parallelism system.
> I have a few questions.
> First, Tracing the events shows that only the main thread calls GOMP_task.

No, any thread can call GOMP_task, in particular the thread that encountered
the #pragma omp task construct.

The GOMP_task function then decides based on the clauses of the construct
(passed in various ways through the arguments of that function) whether it
will be included (executed by the encountering thread), or queued for
later execution.  In the latter case, it will be scheduled during a barrier
(implicit or explicit), see gomp_barrier_handle_tasks called from the
bar.[ch] code, or at other spots, e.g. during taskwait construct
(GOMP_taskwait) or at the end of taskgroup (GOMP_taskgroup_end).

> How do the other worker threads enter the libgomp runtime?

If you never encounter a parallel, teams or target construct, then there is
just one thread that does everything (well, the library is written such that
if you by hand pthread_create, each such thread acts as a separate initial
thread from OpenMP POV).
Threads are created e.g. during parallel construct (GOMP_parallel), where
for non-nested parallelism as the standard requires it reuses existing
threads if possible or spawns new ones, see mainly team.c (gomp_team_start)
for the function that spawns new threads or awakes the ones waiting for
work, or gomp_thread_start in the same file for the function actually run by
the libgomp library created threads.

> I can't find the entry point of the worker threads from the event tracing and the assembly dump. 
> Second, How is the task priority set?

By the user, through priority clause, passed to GOMP_task and then taken
into account when handling tasks in the various queues.

	Jakub

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [GSoC'19, libgomp work-stealing] Task parallelism runtime
  2019-06-03 18:21 ` Jakub Jelinek
@ 2019-06-05 18:25   ` 김규래
  2019-06-05 18:52     ` Jakub Jelinek
  2019-06-05 19:06     ` Janne Blomqvist
  0 siblings, 2 replies; 10+ messages in thread
From: 김규래 @ 2019-06-05 18:25 UTC (permalink / raw)
  To: gcc; +Cc: Jakub Jelinek

Hi, thanks for the detailed explanation.
I think I now get the picture.
Judging from my current understanding, the task-parallelism currently works as follows: 
1. Tasks are placed in a global shared queue.
2. Workers consume the tasks by bashing the queue in a while loop, just as self-scheduling (dynamic scheduling)/
 
Then the improvements including work-stealing must be done by:
1. Each worker holds a dedicated task queue reducing the resource contention.
2. The tasks are distributed in a round-robin fashion
3. work-stealing will resolve the load imbalance.
 
If the above statements are correct, I guess the task priority should be given some special treatment?
 
Ray Kim
 
-----Original Message-----
From: "Jakub Jelinek"<jakub@redhat.com>
To: "김규래"<msca8h@naver.com>;
Cc: <gcc@gcc.gnu.org>;
Sent: 2019-06-04 (화) 03:21:01 (GMT+09:00)
Subject: Re: [GSoC'19, libgomp work-stealing] Task parallelism runtime
 
On Tue, Jun 04, 2019 at 03:01:13AM +0900, 김규래 wrote:
> Hi,
> I've been studying the libgomp task parallelism system.
> I have a few questions.
> First, Tracing the events shows that only the main thread calls GOMP_task.

No, any thread can call GOMP_task, in particular the thread that encountered
the #pragma omp task construct.

The GOMP_task function then decides based on the clauses of the construct
(passed in various ways through the arguments of that function) whether it
will be included (executed by the encountering thread), or queued for
later execution.  In the latter case, it will be scheduled during a barrier
(implicit or explicit), see gomp_barrier_handle_tasks called from the
bar.[ch] code, or at other spots, e.g. during taskwait construct
(GOMP_taskwait) or at the end of taskgroup (GOMP_taskgroup_end).

> How do the other worker threads enter the libgomp runtime?

If you never encounter a parallel, teams or target construct, then there is
just one thread that does everything (well, the library is written such that
if you by hand pthread_create, each such thread acts as a separate initial
thread from OpenMP POV).
Threads are created e.g. during parallel construct (GOMP_parallel), where
for non-nested parallelism as the standard requires it reuses existing
threads if possible or spawns new ones, see mainly team.c (gomp_team_start)
for the function that spawns new threads or awakes the ones waiting for
work, or gomp_thread_start in the same file for the function actually run by
the libgomp library created threads.

> I can't find the entry point of the worker threads from the event tracing and the assembly dump.
> Second, How is the task priority set?

By the user, through priority clause, passed to GOMP_task and then taken
into account when handling tasks in the various queues.

Jakub

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [GSoC'19, libgomp work-stealing] Task parallelism runtime
  2019-06-05 18:25   ` 김규래
@ 2019-06-05 18:52     ` Jakub Jelinek
  2019-06-05 19:06     ` Janne Blomqvist
  1 sibling, 0 replies; 10+ messages in thread
From: Jakub Jelinek @ 2019-06-05 18:52 UTC (permalink / raw)
  To: 김규래; +Cc: gcc

On Thu, Jun 06, 2019 at 03:25:24AM +0900, 김규래 wrote:
> Hi, thanks for the detailed explanation.
> I think I now get the picture.
> Judging from my current understanding, the task-parallelism currently works as follows: 
> 1. Tasks are placed in a global shared queue.

It isn't a global shared queue, but a per-team shared queue, in fact 3
different ones, guarded by the same per-team mutex team->task_lock though:
team->task_queue	used on barriers
task->children_queue	used for #pragma omp taskwait
taskgroup->taskgroup_queue	used at the end of #pragma omp taskgroup

> 2. Workers consume the tasks by bashing the queue in a while loop, just as self-scheduling (dynamic scheduling)/
>  
> Then the improvements including work-stealing must be done by:
> 1. Each worker holds a dedicated task queue reducing the resource contention.
> 2. The tasks are distributed in a round-robin fashion
> 3. work-stealing will resolve the load imbalance.
>  
> If the above statements are correct, I guess the task priority should be given some special treatment?

Yes, one thing to consider is task priority, another thing is what to do
on those #pragma omp taskwait or #pragma omp taskgroup waits where while one
can schedule most of the tasks (the OpenMP specification has restrictions on
what can be scheduled, but the restrictions are mostly relevant to untied
tasks which we don't support anyway (well, ignore untied and schedule them
like tied tasks)), but it might be more beneficial (and what is currently
implemented) to queue only the tasks that will help satisfying what we are
waiting on.  All of priority and only scheduling task or taskgroup children
instead of all tasks are just hints, the implementation may choose other
tasks.
There is another thing though, tasks with dependencies, where
gomp_task_handle_depend which needs to look up addresses in hash tables
for the dependencies also uses the team->task_lock mutex.

	Jakub

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [GSoC'19, libgomp work-stealing] Task parallelism runtime
  2019-06-05 18:25   ` 김규래
  2019-06-05 18:52     ` Jakub Jelinek
@ 2019-06-05 19:06     ` Janne Blomqvist
  2019-06-05 19:42       ` 김규래
  1 sibling, 1 reply; 10+ messages in thread
From: Janne Blomqvist @ 2019-06-05 19:06 UTC (permalink / raw)
  To: 김규래; +Cc: gcc mailing list, Jakub Jelinek

On Wed, Jun 5, 2019 at 9:25 PM 김규래 <msca8h@naver.com> wrote:
>
> Hi, thanks for the detailed explanation.
> I think I now get the picture.
> Judging from my current understanding, the task-parallelism currently works as follows:
> 1. Tasks are placed in a global shared queue.
> 2. Workers consume the tasks by bashing the queue in a while loop, just as self-scheduling (dynamic scheduling)/
>
> Then the improvements including work-stealing must be done by:
> 1. Each worker holds a dedicated task queue reducing the resource contention.
> 2. The tasks are distributed in a round-robin fashion

For nested task submission (does OpenMP support that?) you probably
want to submit to the local queue rather than round-robin, no?

> 3. work-stealing will resolve the load imbalance.
>
> If the above statements are correct, I guess the task priority should be given some special treatment?
>
> Ray Kim
>
> -----Original Message-----
> From: "Jakub Jelinek"<jakub@redhat.com>
> To: "김규래"<msca8h@naver.com>;
> Cc: <gcc@gcc.gnu.org>;
> Sent: 2019-06-04 (화) 03:21:01 (GMT+09:00)
> Subject: Re: [GSoC'19, libgomp work-stealing] Task parallelism runtime
>
> On Tue, Jun 04, 2019 at 03:01:13AM +0900, 김규래 wrote:
> > Hi,
> > I've been studying the libgomp task parallelism system.
> > I have a few questions.
> > First, Tracing the events shows that only the main thread calls GOMP_task.
>
> No, any thread can call GOMP_task, in particular the thread that encountered
> the #pragma omp task construct.
>
> The GOMP_task function then decides based on the clauses of the construct
> (passed in various ways through the arguments of that function) whether it
> will be included (executed by the encountering thread), or queued for
> later execution.  In the latter case, it will be scheduled during a barrier
> (implicit or explicit), see gomp_barrier_handle_tasks called from the
> bar.[ch] code, or at other spots, e.g. during taskwait construct
> (GOMP_taskwait) or at the end of taskgroup (GOMP_taskgroup_end).
>
> > How do the other worker threads enter the libgomp runtime?
>
> If you never encounter a parallel, teams or target construct, then there is
> just one thread that does everything (well, the library is written such that
> if you by hand pthread_create, each such thread acts as a separate initial
> thread from OpenMP POV).
> Threads are created e.g. during parallel construct (GOMP_parallel), where
> for non-nested parallelism as the standard requires it reuses existing
> threads if possible or spawns new ones, see mainly team.c (gomp_team_start)
> for the function that spawns new threads or awakes the ones waiting for
> work, or gomp_thread_start in the same file for the function actually run by
> the libgomp library created threads.
>
> > I can't find the entry point of the worker threads from the event tracing and the assembly dump.
> > Second, How is the task priority set?
>
> By the user, through priority clause, passed to GOMP_task and then taken
> into account when handling tasks in the various queues.
>
> Jakub



-- 
Janne Blomqvist

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [GSoC'19, libgomp work-stealing] Task parallelism runtime
  2019-06-05 19:06     ` Janne Blomqvist
@ 2019-06-05 19:42       ` 김규래
  2019-06-05 20:36         ` Janne Blomqvist
  0 siblings, 1 reply; 10+ messages in thread
From: 김규래 @ 2019-06-05 19:42 UTC (permalink / raw)
  To: Janne Blomqvist; +Cc: gcc mailing list, Jakub Jelinek

> On Wed, Jun 5, 2019 at 9:25 PM 김규래 <msca8h@naver.com> wrote: 
>
> > Hi, thanks for the detailed explanation.
> > I think I now get the picture.
> > Judging from my current understanding, the task-parallelism currently works as follows:
> > 1. Tasks are placed in a global shared queue.
> > 2. Workers consume the tasks by bashing the queue in a while loop, just as self-scheduling (dynamic scheduling)/
> >
> > Then the improvements including work-stealing must be done by:
> > 1. Each worker holds a dedicated task queue reducing the resource contention.
> > 2. The tasks are distributed in a round-robin fashion

> For nested task submission (does OpenMP support that?) you probably
> want to submit to the local queue rather than round-robin, no? 
 
Hi Janne,
I believe you're talking about spawning child tasks within an already running task, which I believe is supported by OpenMP.
In this case, pushing to the local queue could introduce a deadlock if the master thread waits on the spawned tasks. 
A short one though since work-stealing can resolve the deadlock.
A better way to handle this is to round-robin the child tasks to the queues excluding the queue of the thread consuming the current task.
Then waiting on the tasks should never cause a deadlock.
Or did I misunderstand your question?
Maybe there is a specific reason for avoiding avoid round-robin that I missed?
 
Ray Kim

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [GSoC'19, libgomp work-stealing] Task parallelism runtime
  2019-06-05 19:42       ` 김규래
@ 2019-06-05 20:36         ` Janne Blomqvist
  2019-06-06 17:54           ` 김규래
  0 siblings, 1 reply; 10+ messages in thread
From: Janne Blomqvist @ 2019-06-05 20:36 UTC (permalink / raw)
  To: 김규래; +Cc: gcc mailing list, Jakub Jelinek

On Wed, Jun 5, 2019 at 10:42 PM 김규래 <msca8h@naver.com> wrote:

> > On Wed, Jun 5, 2019 at 9:25 PM 김규래 <msca8h@naver.com> wrote:
>
> >
> > > Hi, thanks for the detailed explanation.
> > > I think I now get the picture.
> > > Judging from my current understanding, the task-parallelism currently
> works as follows:
> > > 1. Tasks are placed in a global shared queue.
> > > 2. Workers consume the tasks by bashing the queue in a while loop,
> just as self-scheduling (dynamic scheduling)/
> > >
> > > Then the improvements including work-stealing must be done by:
> > > 1. Each worker holds a dedicated task queue reducing the resource
> contention.
> > > 2. The tasks are distributed in a round-robin fashion
>
> > For nested task submission (does OpenMP support that?) you probably
> > want to submit to the local queue rather than round-robin, no?
>
>
>
> Hi Janne,
>
> I believe you're talking about spawning child tasks within an already
> running task, which I believe is supported by OpenMP.
>
> In this case, pushing to the local queue could introduce a deadlock if the
> master thread waits on the spawned tasks.
>
> A short one though since work-stealing can resolve the deadlock.
>
> A better way to handle this is to round-robin the child tasks to the
> queues excluding the queue of the thread consuming the current task.
>
> Then waiting on the tasks should never cause a deadlock.
>
> Or did I misunderstand your question?
>
> Maybe there is a specific reason for avoiding avoid round-robin that I
> missed?
>
Better cache locality.

Another option, which I guess starts to go out of scope of your gsoc, is
parallel depth first (PDF) search (Blelloch 1999) as an alternative to work
stealing. Here's a presentation about some recent work in this area,
although for Julia and not OpenMP (no idea if PDF would fit with OpenMP at
all): https://www.youtube.com/watch?v=YdiZa0Y3F3c

-- 
Janne Blomqvist

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [GSoC'19, libgomp work-stealing] Task parallelism runtime
  2019-06-05 20:36         ` Janne Blomqvist
@ 2019-06-06 17:54           ` 김규래
  0 siblings, 0 replies; 10+ messages in thread
From: 김규래 @ 2019-06-06 17:54 UTC (permalink / raw)
  To: Janne Blomqvist; +Cc: gcc mailing list, Jakub Jelinek

> Another option, which I guess starts to go out of scope of your gsoc, is
> parallel depth first (PDF) search (Blelloch 1999) as an alternative to work
> stealing. Here's a presentation about some recent work in this area,
> although for Julia and not OpenMP (no idea if PDF would fit with OpenMP at
> all): https://www.youtube.com/watch?v=YdiZa0Y3F3c

I am actually aware of PDF and the works ongoing on the Julia side.
Also, I think it does not go out of the scope of GSoC,
since the essential goal is to implement a more advanced task parallel scheduler anyway.

> Better cache locality.
Despite previous research results that PDF is better in term of locality,
recently developed advanced work-stealing (WS) schemes improved a lot in terms of data locality.
I think an up-to-date quantitive comparison with SOTA algorithms from both sides is required.

Personally I think the WS framework is more flexible and popular? right now.
I'd like to hear the opinion of others on the subject.

Ray Kim

^ permalink raw reply	[flat|nested] 10+ messages in thread

*  Re: [GSoC'19, libgomp work-stealing] Task parallelism runtime
       [not found] <e2a9f7c55311795785d0f2c47f70acbd@cweb001.nm.nfra.io>
  2019-06-24 19:55 ` 김규래
@ 2019-07-09 12:56 ` 김규래
  1 sibling, 0 replies; 10+ messages in thread
From: 김규래 @ 2019-07-09 12:56 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: gcc

Hi, 
This is an update about my status.
I've been working on unifying the three queues into a single queue.
I'm almost finished and passed all the tests except for the dependency handling part.
 
Ray Kim

^ permalink raw reply	[flat|nested] 10+ messages in thread

*  Re: [GSoC'19, libgomp work-stealing] Task parallelism runtime
       [not found] <e2a9f7c55311795785d0f2c47f70acbd@cweb001.nm.nfra.io>
@ 2019-06-24 19:55 ` 김규래
  2019-07-09 12:56 ` 김규래
  1 sibling, 0 replies; 10+ messages in thread
From: 김규래 @ 2019-06-24 19:55 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: gcc

Hi,
I'm not very familiar with the gomp plugin system.
However, looking at 'GOMP_PLUGIN_target_task_completion' seem like tasks have to go in and out of the runtime.
In that case, is it right that the tasks have to know from which queue they came from?
I think I'll have to add the id of the corresponding queue of each task in the gomp_task structure.
 
Ray Kim

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2019-07-09 12:56 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-03 18:01 [GSoC'19, libgomp work-stealing] Task parallelism runtime 김규래
2019-06-03 18:21 ` Jakub Jelinek
2019-06-05 18:25   ` 김규래
2019-06-05 18:52     ` Jakub Jelinek
2019-06-05 19:06     ` Janne Blomqvist
2019-06-05 19:42       ` 김규래
2019-06-05 20:36         ` Janne Blomqvist
2019-06-06 17:54           ` 김규래
     [not found] <e2a9f7c55311795785d0f2c47f70acbd@cweb001.nm.nfra.io>
2019-06-24 19:55 ` 김규래
2019-07-09 12:56 ` 김규래

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).