From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by sourceware.org (Postfix) with ESMTP id B77AF385801D for ; Mon, 22 Feb 2021 13:49:50 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org B77AF385801D Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-227-70HH6lraObqrItq4ftDzYg-1; Mon, 22 Feb 2021 08:49:47 -0500 X-MC-Unique: 70HH6lraObqrItq4ftDzYg-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id CC2EB801965; Mon, 22 Feb 2021 13:49:46 +0000 (UTC) Received: from tucnak.zalov.cz (ovpn-112-197.ams2.redhat.com [10.36.112.197]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 5E0225D9D0; Mon, 22 Feb 2021 13:49:46 +0000 (UTC) Received: from tucnak.zalov.cz (localhost [127.0.0.1]) by tucnak.zalov.cz (8.16.1/8.16.1) with ESMTPS id 11MDnihV1341762 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT); Mon, 22 Feb 2021 14:49:44 +0100 Received: (from jakub@localhost) by tucnak.zalov.cz (8.16.1/8.16.1/Submit) id 11MDnhZA1341761; Mon, 22 Feb 2021 14:49:43 +0100 Date: Mon, 22 Feb 2021 14:49:43 +0100 From: Jakub Jelinek To: Kwok Cheung Yeung Cc: GCC Patches Subject: Re: [WIP] Re: [PATCH] openmp: Fix intermittent hanging of task-detach-6 libgomp tests [PR98738] Message-ID: <20210222134943.GA1837485@tucnak> Reply-To: Jakub Jelinek References: <20210129150317.GN4020736@tucnak> <0aca6daf-356a-7b03-c007-e33c35114356@codesourcery.com> MIME-Version: 1.0 In-Reply-To: <0aca6daf-356a-7b03-c007-e33c35114356@codesourcery.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Spam-Status: No, score=-6.1 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Feb 2021 13:49:52 -0000 On Fri, Feb 19, 2021 at 07:12:42PM +0000, Kwok Cheung Yeung wrote: > I have opted for a union of completion_sem (for tasks that are undeferred) > and a struct gomp_team *detach_team (for deferred tasks) that holds the team > if the completion event has not yet fulfilled, or NULL if is it. I don't see > the point of having an indirection to the union here since the union is just > the size of a pointer, so it might as well be inlined. I see three issues with the union of completion_sem and detach_team done that way. 1) while linux --enable-futex and accel gomp_sem_t is small (int), rtems and especially posix gomp_sem_t is large; so while it might be a good idea to inline gomp_sem_t on config/{linux,accel} into the union, for the rest it might be better to use indirection; if it is only for the undeferred tasks, it could be just using an automatic variable and put into the struct address of that; could be done either always, or define some macro in config/{linux,accel}/sem.h that gomp_sem_t is small and decide on the indirection based on that macro 2) kind == GOMP_TASK_UNDEFERRED is true also for the deferred tasks while running the cpyfn callback; guess this could be dealt with making sure the detach handling is done only after thr->task = task; if (cpyfn) { cpyfn (arg, data); task->copy_ctors_done = true; } else memcpy (arg, data, arg_size); thr->task = parent; task->kind = GOMP_TASK_WAITING; task->fn = fn; task->fn_data = arg; task->final_task = (flags & GOMP_TASK_FLAG_FINAL) >> 1; I see you've instead removed the GOMP_TASK_UNDEFERRED but the rationale for that is that the copy constructors are being run synchronously 3) kind is not constant, for the deferred tasks it can change over the lifetime of the task, as you've said in the comments, it is kind == GOMP_TASK_UNDEFERRED vs. other values; while the changes of task->kind are done while holding the task lock, omp_fulfill_event reads it before locking that lock, so I think it needs to be done using if (__atomic_load_n (&task->kind, MEMMODEL_RELAXED) == GOMP_TASK_UNDEFERRED) Pedantically the stores to task->kind also need to be done with __atomic_store_n MEMMODEL_RELAXED. Now, similarly for 3) on task->kind, task->detach_team is similar case, again, some other omp_fulfill_event can clear it (under lock, but still read outside of the lock), so it probably should be read with struct gomp_team *team = __atomic_load_n (&task->detach_team, MEMMODEL_RELAXED); And again, pedantically the detach_team stores should be atomic relaxed stores too. > > Do you agree with this, or see some reason why this can't work? > > The main problem I see is this code in gomp_barrier_handle_tasks: > > if (--team->task_count == 0 > && gomp_team_barrier_waiting_for_tasks (&team->barrier)) > { > gomp_team_barrier_done (&team->barrier, state); > > We do not have access to state from within omp_fulfill_event, so how should > this be handled? Sure, omp_fulfill_event shouldn't do any waiting, it needs to awake anything that could have been waiting. > @@ -688,8 +697,7 @@ struct gomp_team > int work_share_cancelled; > int team_cancelled; > > - /* Tasks waiting for their completion event to be fulfilled. */ > - struct priority_queue task_detach_queue; > + /* Number of tasks waiting for their completion event to be fulfilled. */ > unsigned int task_detach_count; Do we need task_detach_count? Currently it is only initialized and incremented/decremented, but never tested for anything. Though see below. > + gomp_debug (0, "omp_fulfill_event: %p event fulfilled for finished task\n", > + task); > + size_t new_tasks = gomp_task_run_post_handle_depend (task, team); > + gomp_task_run_post_remove_parent (task); > + gomp_clear_parent (&task->children_queue); > + gomp_task_run_post_remove_taskgroup (task); > + team->task_count--; > + team->task_detach_count--; > + > + /* Wake up any threads that may be waiting for the detached task > + to complete. */ > + struct gomp_task *parent = task->parent; > + > + if (parent && parent->taskwait) > + { > + if (parent->taskwait->in_taskwait) > + { > + parent->taskwait->in_taskwait = false; > + gomp_sem_post (&parent->taskwait->taskwait_sem); > + } > + else if (parent->taskwait->in_depend_wait) > + { > + parent->taskwait->in_depend_wait = false; > + gomp_sem_post (&parent->taskwait->taskwait_sem); > + } > + } Looking at gomp_task_run_post_remove_parent, doesn't that function already handle the in_taskwait and in_depend_wait gomp_sem_posts? > + if (task->taskgroup && task->taskgroup->in_taskgroup_wait) > + { > + task->taskgroup->in_taskgroup_wait = false; > + gomp_sem_post (&task->taskgroup->taskgroup_sem); > + } And into gomp_task_run_post_remove_taskgroup, doesn't that already handle the in_taskgroup_wait gomp_sem_post? > + > + int do_wake = 0; > + if (new_tasks > 1) > + { > + do_wake = team->nthreads - team->task_running_count; > + if (do_wake > new_tasks) > + do_wake = new_tasks; > + } > + > + gomp_mutex_unlock (&team->task_lock); > + if (do_wake) > + gomp_team_barrier_wake (&team->barrier, do_wake); I think for the barrier case we need to make a difference between team == gomp_thread ()->ts.team case (I guess the more usual one), where the fact that we know some thread (the one calling omp_fulfill_event) is provably doing something other than sitting on a barrier waiting for tasks means it is much simpler, i.e. that gomp_team_barrier_set_task_pending (&team->barrier); is needed only when some tasks dependent on completion of the current one were added to the queues, but gomp_task_run_post_handle_depend -> gomp_task_run_post_handle_dependers takes care of that call and the above do_wake handles that too, though the exact details I think need fixing - in gomp_barrier_handle_tasks the reason for if (new_tasks > 1) is that if there is a single dependent task, the current thread just finished handling one task and so can take that single task and so no need to wake up. While in the omp_fulfill_event case, even if there is just one new task, we need to schedule it to some thread and so is desirable to wake some thread. All we know (if team == gomp_thread ()->ts.team) is that at least one thread is doing something else but that one could be busy for quite some time. And the other case is the omp_fulfill_event call from unshackeled thread, i.e. team != gomp_thread ()->ts.team. Here, e.g. what gomp_target_task_completion talks about applies: /* I'm afraid this can't be done after releasing team->task_lock, as gomp_target_task_completion is run from unrelated thread and therefore in between gomp_mutex_unlock and gomp_team_barrier_wake the team could be gone already. */ Even there are 2 different cases. One is where team->task_running_count > 0, at that point we know at least one task is running and so the only thing that is unsafe gomp_team_barrier_wake (&team->barrier, do_wake); after gomp_mutex_unlock (&team->task_lock); - there is a possibility that in between the two calls the thread running omp_fulfill_event gets interrupted or just delayed and the team finishes barrier and is freed too. So the gomp_team_barrier_wake needs to be done before the unlock in that case. And then there is the case where all tasks finish on a barrier but some haven't been fulfilled yet. In that case, when the last thread calls gomp_team_barrier_wait_end we run in there: if (__builtin_expect (state & BAR_WAS_LAST, 0)) ... if (__builtin_expect (team->task_count, 0)) { gomp_barrier_handle_tasks (state); state &= ~BAR_WAS_LAST; } and gomp_barrier_handle_tasks it will hit the gomp_mutex_lock (&team->task_lock); if (gomp_barrier_last_thread (state)) { if (team->task_count == 0) { gomp_team_barrier_done (&team->barrier, state); gomp_mutex_unlock (&team->task_lock); gomp_team_barrier_wake (&team->barrier, 0); return; } gomp_team_barrier_set_waiting_for_tasks (&team->barrier); } but team->task_count is not 0 (because of the one or more non-fulfilled tasks), priority_queue_empty_p is true and so after unlocking the function will just return. And then it will jump to do_wait and on wake ups do gomp_barrier_handle_tasks again. So, I think for the team != gomp_thread ()->ts.team && !do_wake && gomp_team_barrier_waiting_for_tasks (&team->barrier) && team->task_detach_count == 0 case, we need to wake up 1 thread anyway and arrange for it to do: gomp_team_barrier_done (&team->barrier, state); gomp_mutex_unlock (&team->task_lock); gomp_team_barrier_wake (&team->barrier, 0); Possibly in if (!priority_queue_empty_p (&team->task_queue, MEMMODEL_RELAXED)) add else if (team->task_count == 0 && gomp_team_barrier_waiting_for_tasks (&team->barrier)) { gomp_team_barrier_done (&team->barrier, state); gomp_mutex_unlock (&team->task_lock); gomp_team_barrier_wake (&team->barrier, 0); if (to_free) { gomp_finish_task (to_free); free (to_free); } return; } but the: if (--team->task_count == 0 && gomp_team_barrier_waiting_for_tasks (&team->barrier)) { gomp_team_barrier_done (&team->barrier, state); gomp_mutex_unlock (&team->task_lock); gomp_team_barrier_wake (&team->barrier, 0); gomp_mutex_lock (&team->task_lock); } in that case would then be incorrect, we don't want to do that twice. So, either that second if would need to do the to_free handling and return instead of taking the lock again and looping, or perhaps we can just do --team->task_count; there instead and let the above added else if handle that? Jakub