From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 55156 invoked by alias); 12 Jan 2016 13:46:59 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 55128 invoked by uid 89); 12 Jan 2016 13:46:57 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.6 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2 spammy=queues, finalize, nevertheless, task.c X-HELO: mx2.suse.de Received: from mx2.suse.de (HELO mx2.suse.de) (195.135.220.15) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (CAMELLIA256-SHA encrypted) ESMTPS; Tue, 12 Jan 2016 13:46:56 +0000 Received: from relay2.suse.de (charybdis-ext.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id D3535AC50; Tue, 12 Jan 2016 13:46:51 +0000 (UTC) Date: Tue, 12 Jan 2016 13:46:00 -0000 From: Martin Jambor To: Jakub Jelinek Cc: GCC Patches Subject: Re: [hsa 2/10] Modifications to libgomp proper Message-ID: <20160112134652.GN3060@virgil.suse.cz> Mail-Followup-To: Jakub Jelinek , GCC Patches References: <20151207111758.GA24234@virgil.suse.cz> <20151207111957.GC24234@virgil.suse.cz> <20151209113536.GP5675@tucnak.redhat.com> <20151210175223.GH3534@virgil.suse.cz> <20151211180529.GH5675@tucnak.redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20151211180529.GH5675@tucnak.redhat.com> User-Agent: Mutt/1.5.24 (2015-08-30) X-IsSubscribed: yes X-SW-Source: 2016-01/txt/msg00725.txt.bz2 Hi, On Fri, Dec 11, 2015 at 07:05:29PM +0100, Jakub Jelinek wrote: > On Thu, Dec 10, 2015 at 06:52:23PM +0100, Martin Jambor wrote: > > > > --- a/libgomp/task.c > > > > +++ b/libgomp/task.c > > > > @@ -581,6 +581,7 @@ GOMP_PLUGIN_target_task_completion (void *data) > > > > gomp_mutex_unlock (&team->task_lock); > > > > } > > > > ttask->state = GOMP_TARGET_TASK_FINISHED; > > > > + free (ttask->firstprivate_copies); > > > > gomp_target_task_completion (team, task); > > > > gomp_mutex_unlock (&team->task_lock); > > > > } > > > > > > So, this function should have a special case for the SHARED_MEM case, handle > > > it closely to say how GOMP_taskgroup_end handles the finish_cancelled: > > > case. Just note that the target task is missing from certain queues at that > > > point. > > > > I'm afraid I need some help here. I do not quite understand how is > > finish_cancelled in GOMP_taskgroup_end similar, it seems to be doing > > much more than freeing one pointer. What is exactly the issue with > > the above? > > > > Nevertheless, after reading through bits of task.c again, I wonder > > whether any copying (for both shared memory target and the host) in > > gomp_target_task_fn is actually necessary because it seems to be also > > done in gomp_create_target_task. Does that not apply somehow? > > The target task is scheduled for the first action as normal task, and the > scheduling of it already removes it from some of the queues (each task is > put into 1-3 queues), i.e. actions performed mostly by > gomp_task_run_pre. Then the team task lock is unlocked and the task is run. > Finally, for normal tasks, gomp_task_run_post_handle_depend, > gomp_task_run_post_remove_parent, etc. is run. Now, for async target tasks > that have something running on some other device at that point, we don't do > that, but instead make it GOMP_TASK_ASYNC_RUNNING. And continue with other > stuff, until gomp_target_task_completion is run. > For non-shared mem that needs to readd the task again into the queues, so > that it will be scheduled again. But you don't need that for shared mem > target tasks, they can just free the firstprivate_copies and finalize the > task. > At the time gomp_target_task_completion is called, the task is pretty much > in the same state as it is around the finish_cancelled:; label. > So instead of what the gomp_target_task_completion function does, > you would for SHARED_MEM do something like: > size_t new_tasks > = gomp_task_run_post_handle_depend (task, team); > gomp_task_run_post_remove_parent (task); > gomp_clear_parent (&task->children_queue); > gomp_task_run_post_remove_taskgroup (task); > team->task_count--; > do_wake = 0; > if (new_tasks > 1) > { > do_wake = team->nthreads - team->task_running_count > - !task->in_tied_task; > if (do_wake > new_tasks) > do_wake = new_tasks; > } > // Unlike other places, the following will be also run with the > // task_lock held, but I'm afraid there is nothing to do about it. > // See the comment in gomp_target_task_completion. > gomp_finish_task (task); > free (task); > if (do_wake) > gomp_team_barrier_wake (&team->barrier, do_wake); > I tried the above but libgomp testcase target-33.c always got stuck within GOMP_taskgroup_end call, more specifically in gomp_team_barrier_wait_end in config/linux/bar.c where the the first call to gomp_barrier_handle_tasks left the barrier->generation as BAR_WAITING_FOR_TASK and then nothing ever happened, even as the callbacks fired. After looking into the tasking mechanism for basically the whole day yesterday, I *think* I fixed it by calling gomp_team_barrier_set_task_pending from the callback and another hunk in gomp_barrier_handle_tasks so that it clears that barrier flag even if it has not picked up any tasks. Please let me know if you think it makes sense. If so, I'll include it in an HSA patch set I hope to generate today. Otherwise I guess I'd prefer to remove the shared-memory path and revert to old behavior as a temporary measure until we find out what was wrong. Thanks and sorry that it took me so long to resolve this, Martin diff --git a/libgomp/task.c b/libgomp/task.c index ab5df51..828c1fb 100644 --- a/libgomp/task.c +++ b/libgomp/task.c @@ -566,6 +566,14 @@ gomp_target_task_completion (struct gomp_team *team, struct gomp_task *task) gomp_team_barrier_wake (&team->barrier, 1); } +static inline size_t +gomp_task_run_post_handle_depend (struct gomp_task *child_task, + struct gomp_team *team); +static inline void +gomp_task_run_post_remove_parent (struct gomp_task *child_task); +static inline void +gomp_task_run_post_remove_taskgroup (struct gomp_task *child_task); + /* Signal that a target task TTASK has completed the asynchronously running phase and should be requeued as a task to handle the variable unmapping. */ @@ -584,8 +592,34 @@ GOMP_PLUGIN_target_task_completion (void *data) gomp_mutex_unlock (&team->task_lock); } ttask->state = GOMP_TARGET_TASK_FINISHED; - free (ttask->firstprivate_copies); - gomp_target_task_completion (team, task); + + if (ttask->devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM) + { + free (ttask->firstprivate_copies); + size_t new_tasks + = gomp_task_run_post_handle_depend (task, team); + gomp_task_run_post_remove_parent (task); + gomp_clear_parent (&task->children_queue); + gomp_task_run_post_remove_taskgroup (task); + team->task_count--; + int do_wake = 0; + if (new_tasks) + { + do_wake = team->nthreads - team->task_running_count + - !task->in_tied_task; + if (do_wake > new_tasks) + do_wake = new_tasks; + } + /* Unlike other places, the following will be also run with the task_lock + held, but there is nothing to do about it. See the comment in + gomp_target_task_completion. */ + gomp_finish_task (task); + free (task); + gomp_team_barrier_set_task_pending (&team->barrier); + gomp_team_barrier_wake (&team->barrier, do_wake ? do_wake : 1); + } + else + gomp_target_task_completion (team, task); gomp_mutex_unlock (&team->task_lock); } @@ -1275,7 +1309,12 @@ gomp_barrier_handle_tasks (gomp_barrier_state_t state) thr->task = task; } else - return; + { + if (team->task_count == 0 + && gomp_team_barrier_waiting_for_tasks (&team->barrier)) + gomp_team_barrier_done (&team->barrier, state); + return; + } gomp_mutex_lock (&team->task_lock); if (child_task) {