* Re: [GSoC'19, libgomp work-stealing] Task parallelism runtime [not found] <e2a9f7c55311795785d0f2c47f70acbd@cweb001.nm.nfra.io> @ 2019-06-24 19:55 ` 김규래 2019-06-24 20:13 ` Jakub Jelinek 2019-07-13 7:46 ` John Pinkerton 2019-07-09 12:56 ` 김규래 1 sibling, 2 replies; 12+ messages in thread From: 김규래 @ 2019-06-24 19:55 UTC (permalink / raw) To: Jakub Jelinek; +Cc: gcc Hi, I'm not very familiar with the gomp plugin system. However, looking at 'GOMP_PLUGIN_target_task_completion' seem like tasks have to go in and out of the runtime. In that case, is it right that the tasks have to know from which queue they came from? I think I'll have to add the id of the corresponding queue of each task in the gomp_task structure. Ray Kim ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Re: [GSoC'19, libgomp work-stealing] Task parallelism runtime 2019-06-24 19:55 ` [GSoC'19, libgomp work-stealing] Task parallelism runtime 김규래 @ 2019-06-24 20:13 ` Jakub Jelinek 2019-07-13 7:46 ` John Pinkerton 1 sibling, 0 replies; 12+ messages in thread From: Jakub Jelinek @ 2019-06-24 20:13 UTC (permalink / raw) To: 김규래; +Cc: gcc On Tue, Jun 25, 2019 at 04:55:17AM +0900, ê¹ê·ë wrote: > I'm not very familiar with the gomp plugin system. > However, looking at 'GOMP_PLUGIN_target_task_completion' seem like tasks have to go in and out of the runtime. > In that case, is it right that the tasks have to know from which queue they came from? > I think I'll have to add the id of the corresponding queue of each task in the gomp_task structure. While libgomp has a plugin system, the only supported plugins are those in the tree, i.e. libgomp/plugin/plugin-{hsa,nvptx}.c and liboffloadmic/plugin/* nvptx plugin doesn't have async support ATM, so it is just hsa and xeonphi offloading that can call it when an asynchronous target region execution is over. No matter in which task queue the task is, gomp_target_task_completion needs to ensure that if something already waits on it (taskwait, taskgroup end, barrier, dependency wait), that it is awaken. And, like for other parts of task.c, there needs to be a design what lock is used to protect any code that needs to be guarded. The current code as you know uses team->task_lock as a big lock, I think with the separate workqueues + work stealing you need per implicit task lock + the per team (team->task_lock), design the locking such that there is no ABBA deadlock possibility and that you use the team task lock only when really necessary (not sure, but perhaps one example where I really don't see much way to avoid the per team lock are task dependencies, the hash table for that etc.). Jakub ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Re: [GSoC'19, libgomp work-stealing] Task parallelism runtime 2019-06-24 19:55 ` [GSoC'19, libgomp work-stealing] Task parallelism runtime 김규래 2019-06-24 20:13 ` Jakub Jelinek @ 2019-07-13 7:46 ` John Pinkerton 1 sibling, 0 replies; 12+ messages in thread From: John Pinkerton @ 2019-07-13 7:46 UTC (permalink / raw) To: gcc unsubscribe On Mon, Jun 24, 2019, at 3:55 PM, 김규래 wrote: > Hi, > I'm not very familiar with the gomp plugin system. > However, looking at 'GOMP_PLUGIN_target_task_completion' seem like > tasks have to go in and out of the runtime. > In that case, is it right that the tasks have to know from which queue > they came from? > I think I'll have to add the id of the corresponding queue of each task > in the gomp_task structure. > > Ray Kim > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [GSoC'19, libgomp work-stealing] Task parallelism runtime [not found] <e2a9f7c55311795785d0f2c47f70acbd@cweb001.nm.nfra.io> 2019-06-24 19:55 ` [GSoC'19, libgomp work-stealing] Task parallelism runtime 김규래 @ 2019-07-09 12:56 ` 김규래 2019-07-13 6:28 ` Jakub Jelinek 1 sibling, 1 reply; 12+ messages in thread From: 김규래 @ 2019-07-09 12:56 UTC (permalink / raw) To: Jakub Jelinek; +Cc: gcc Hi, This is an update about my status. I've been working on unifying the three queues into a single queue. I'm almost finished and passed all the tests except for the dependency handling part. Ray Kim ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Re: [GSoC'19, libgomp work-stealing] Task parallelism runtime 2019-07-09 12:56 ` 김규래 @ 2019-07-13 6:28 ` Jakub Jelinek 2019-07-21 7:46 ` 김규래 0 siblings, 1 reply; 12+ messages in thread From: Jakub Jelinek @ 2019-07-13 6:28 UTC (permalink / raw) To: 김규래; +Cc: gcc On Tue, Jul 09, 2019 at 09:56:00PM +0900, ê¹ê·ë wrote: > Hi, > This is an update about my status. > I've been working on unifying the three queues into a single queue. > I'm almost finished and passed all the tests except for the dependency handling part. For dependencies, I can imagine taking a lock on the parent task rather than a team lock when dealing with the dependency data structures, and outside of the lock perhaps do a quick check if there are any dependencies using atomic load. I can't imagine how one could get away without that though, and while that can scale well if you have many tasks that spawn many other tasks, it will still act as a team lock if say all tasks are spawned from the same parent task. Jakub ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Re: [GSoC'19, libgomp work-stealing] Task parallelism runtime 2019-07-13 6:28 ` Jakub Jelinek @ 2019-07-21 7:46 ` 김규래 2019-07-22 18:54 ` Jakub Jelinek 0 siblings, 1 reply; 12+ messages in thread From: 김규래 @ 2019-07-21 7:46 UTC (permalink / raw) To: Jakub Jelinek; +Cc: gcc Hi Jakub, About the snippet below, if (gomp_barrier_last_thread (state)) { if (team->task_count == 0) { gomp_team_barrier_done (&team->barrier, state); gomp_mutex_unlock (&team->task_lock); gomp_team_barrier_wake (&team->barrier, 0); return; } gomp_team_barrier_set_waiting_for_tasks (&team->barrier); } Am I safe to assume that gomp_barrier_last_thread is thread-safe? Ray Kim ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Re: [GSoC'19, libgomp work-stealing] Task parallelism runtime 2019-07-21 7:46 ` 김규래 @ 2019-07-22 18:54 ` Jakub Jelinek 2019-07-22 19:00 ` 김규래 0 siblings, 1 reply; 12+ messages in thread From: Jakub Jelinek @ 2019-07-22 18:54 UTC (permalink / raw) To: 김규래; +Cc: gcc On Sun, Jul 21, 2019 at 04:46:33PM +0900, ê¹ê·ë wrote: > About the snippet below, > > if (gomp_barrier_last_thread (state)) > { > if (team->task_count == 0) > { > gomp_team_barrier_done (&team->barrier, state); > gomp_mutex_unlock (&team->task_lock); > gomp_team_barrier_wake (&team->barrier, 0); > return; > } > gomp_team_barrier_set_waiting_for_tasks (&team->barrier); > } > > Am I safe to assume that gomp_barrier_last_thread is thread-safe? Yes, you can look up the definition. gomp_ barrier_last_thread is just a bit in the state bitmask passed to the routine, it is set on the last thread that encounters the barrier, which is figured out by doing atomic subtraction from the counter. Jakub ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Re: [GSoC'19, libgomp work-stealing] Task parallelism runtime 2019-07-22 18:54 ` Jakub Jelinek @ 2019-07-22 19:00 ` 김규래 2019-08-03 9:12 ` 김규래 0 siblings, 1 reply; 12+ messages in thread From: 김규래 @ 2019-07-22 19:00 UTC (permalink / raw) To: Jakub Jelinek; +Cc: gcc > Yes, you can look up the definition. > gomp_ barrier_last_thread is just a bit in the state bitmask passed to the > routine, it is set on the last thread that encounters the barrier, which is > figured out by doing atomic subtraction from the counter. I saw the implementation, just wanted to be sure that's the general case. Thanks. Ray Kim -----Original Message----- From: "Jakub Jelinek"<jakub@redhat.com> To: "김규래"<msca8h@naver.com>; Cc: <gcc@gcc.gnu.org>; Sent: 2019-07-23 (화) 03:54:13 (GMT+09:00) Subject: Re: Re: [GSoC'19, libgomp work-stealing] Task parallelism runtime On Sun, Jul 21, 2019 at 04:46:33PM +0900, 김규래 wrote: > About the snippet below, > > if (gomp_barrier_last_thread (state)) > { > if (team->task_count == 0) > { > gomp_team_barrier_done (&team->barrier, state); > gomp_mutex_unlock (&team->task_lock); > gomp_team_barrier_wake (&team->barrier, 0); > return; > } > gomp_team_barrier_set_waiting_for_tasks (&team->barrier); > } > > Am I safe to assume that gomp_barrier_last_thread is thread-safe? Yes, you can look up the definition. gomp_ barrier_last_thread is just a bit in the state bitmask passed to the routine, it is set on the last thread that encounters the barrier, which is figured out by doing atomic subtraction from the counter. Jakub ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Re: [GSoC'19, libgomp work-stealing] Task parallelism runtime 2019-07-22 19:00 ` 김규래 @ 2019-08-03 9:12 ` 김규래 2019-08-05 10:32 ` Jakub Jelinek 0 siblings, 1 reply; 12+ messages in thread From: 김규래 @ 2019-08-03 9:12 UTC (permalink / raw) To: Jakub Jelinek; +Cc: gcc Hi, I'm currently having trouble implementing the thread sleeping mechanism when the queue is out of tasks. Problem is, it's hard to maintain consistency between the thread sleeping routine and the queues. See the pseudocode below, 1. check queue is empty 2. go to sleep if we go lock-free, the consistency between 1 and 2 cannot be maintained. Moreover, if we go to a multi-queue setting, simply checking whether all the queues are empty is not consistent. I would really like to have the thread sleeping mechanism but I'm not sure how we could do it. Ray Kim ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Re: [GSoC'19, libgomp work-stealing] Task parallelism runtime 2019-08-03 9:12 ` 김규래 @ 2019-08-05 10:32 ` Jakub Jelinek 2019-08-05 11:01 ` 김규래 0 siblings, 1 reply; 12+ messages in thread From: Jakub Jelinek @ 2019-08-05 10:32 UTC (permalink / raw) To: 김규래; +Cc: gcc On Sat, Aug 03, 2019 at 06:11:58PM +0900, ê¹ê·ë wrote: > I'm currently having trouble implementing the thread sleeping mechanism when the queue is out of tasks. > Problem is, it's hard to maintain consistency between the thread sleeping routine and the queues. > See the pseudocode below, > > 1. check queue is empty > 2. go to sleep > > if we go lock-free, the consistency between 1 and 2 cannot be maintained. I thought we don't want to go lock-free, the queue operations aren't easily implementable lock-free, but instead with a lock for each of the queues, so in the multi-queue setting having locks on the implicit tasks that hold those queues. What can and should be done without lock is perhaps some preliminary check if a queue is empty, that can be done through __atomic_load. And, generally go to sleep is done outside of the critical section, inside of the critical section we decide if we go to sleep or not, and then go to sleep either (on Linux) using futexes, or otherwise using semaphores, both have the properties that one can already post to them before some other thread sleeps on it, and in that case the other thread doesn't actually go to sleep. The wake up (post on the semaphore or updating the memory + later futex wake) is sometimes done inside of a critical section, the updating of memory if it is not atomic increase/decrease and the latter depending on whether we remember from the atomic operation whether the wake up is needed or not and defer it until after the critical section. Given say: ++team->task_count; ++team->task_queued_count; gomp_team_barrier_set_task_pending (&team->barrier); do_wake = team->task_running_count + !parent->in_tied_task < team->nthreads; gomp_mutex_unlock (&team->task_lock); if (do_wake) gomp_team_barrier_wake (&team->barrier, 1); you can see the wake up is done outside of the critical section. If team->task_lock isn't used, there will be of course problems, say team->task_count and team->task_queued_count need to be bumped atomically, ditto operations on team->barrier, and the question is what to do with the team->task_running_count check, if that one is updated atomically too, maybe __atomic_load might be good enough, though perhaps worst case it might mean we don't in some cases wake anybody, so there will be threads idling instead of doing useful work, but at least one thread probably should handle it later. Jakub ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Re: [GSoC'19, libgomp work-stealing] Task parallelism runtime 2019-08-05 10:32 ` Jakub Jelinek @ 2019-08-05 11:01 ` 김규래 [not found] ` <20190819061020.GA27842@laptop.zalov.cz> 0 siblings, 1 reply; 12+ messages in thread From: 김규래 @ 2019-08-05 11:01 UTC (permalink / raw) To: Jakub Jelinek; +Cc: gcc general > I thought we don't want to go lock-free, the queue operations > aren't easily > implementable lock-free, but instead with a lock for each of > the queues, Hi, By lock-free I meant to use locks only for the queues, But my terminology was indeed confusing sorry about that. > mean we don't in some cases wake anybody, so there will be > threads idling > instead of doing useful work, but at least one thread > probably should handle > it later. I was personally worried about this case Since this could result in huge inefficiencies, but maybe it'll be fine. I'll first try to implement it. Thanks Ray Kim ^ permalink raw reply [flat|nested] 12+ messages in thread
[parent not found: <20190819061020.GA27842@laptop.zalov.cz>]
* Re: Re: [GSoC'19, libgomp work-stealing] Task parallelism runtime [not found] ` <20190819061020.GA27842@laptop.zalov.cz> @ 2019-08-25 7:49 ` 김규래 0 siblings, 0 replies; 12+ messages in thread From: 김규래 @ 2019-08-25 7:49 UTC (permalink / raw) To: Jakub Jelinek; +Cc: gcc general Hi Jakub, I think the current semaphore sleep system ought to be improved. I'm not sure how but since the GSoC deadline is approaching I'll just post the results without the semaphores. Instead of sleeping on a per-task basis (for example there are depend waits, task waits, taskgroup waits etc..), I think we should simply sleep the threads when the queue is empty and wake them up whenever a task finished executing or a new task has been added to the queue. This shouldn't be too difficult to implement using semaphores. However, since the current gomp semaphores are not always the most performant, I'm not absolutely certain how to do this. I'll defer this to after GSoC. Let me know if you have an idea. Ray Kim ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2019-08-25 7:49 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <e2a9f7c55311795785d0f2c47f70acbd@cweb001.nm.nfra.io> 2019-06-24 19:55 ` [GSoC'19, libgomp work-stealing] Task parallelism runtime 김규래 2019-06-24 20:13 ` Jakub Jelinek 2019-07-13 7:46 ` John Pinkerton 2019-07-09 12:56 ` 김규래 2019-07-13 6:28 ` Jakub Jelinek 2019-07-21 7:46 ` 김규래 2019-07-22 18:54 ` Jakub Jelinek 2019-07-22 19:00 ` 김규래 2019-08-03 9:12 ` 김규래 2019-08-05 10:32 ` Jakub Jelinek 2019-08-05 11:01 ` 김규래 [not found] ` <20190819061020.GA27842@laptop.zalov.cz> 2019-08-25 7:49 ` 김규래
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).