From: Jakub Jelinek <jakub@redhat.com>
To: Kwok Cheung Yeung <kcy@codesourcery.com>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>
Subject: Re: [WIP] Re: [PATCH] openmp: Fix intermittent hanging of task-detach-6 libgomp tests [PR98738]
Date: Wed, 24 Feb 2021 20:46:25 +0100 [thread overview]
Message-ID: <20210224194625.GC4020736@tucnak> (raw)
In-Reply-To: <253a1682-5e95-c8a1-4274-ad1d9db65c44@codesourcery.com>
On Wed, Feb 24, 2021 at 06:17:01PM +0000, Kwok Cheung Yeung wrote:
> > 1) while linux --enable-futex and accel gomp_sem_t is small (int), rtems
> > and especially posix gomp_sem_t is large; so while it might be a good
> > idea to inline gomp_sem_t on config/{linux,accel} into the union, for
> > the rest it might be better to use indirection; if it is only for the
> > undeferred tasks, it could be just using an automatic variable and
> > put into the struct address of that; could be done either always,
> > or define some macro in config/{linux,accel}/sem.h that gomp_sem_t is
> > small and decide on the indirection based on that macro
>
> I think a pointer to an automatic variable would be simplest.
Agreed.
> Can anything in cpyfn make use of the fact that kind==GOMP_TASK_UNDEFERRED
> while executing it? Anyway, if we want to keep this, then I suppose we could
> just add an extra field deferred_p that does not change for the lifetime of
> the task to indicate that the task is 'really' a deferred task.
Adding a bool is fine, but see bellow.
> > 3) kind is not constant, for the deferred tasks it can change over the
> > lifetime of the task, as you've said in the comments, it is kind ==
> > GOMP_TASK_UNDEFERRED vs. other values; while the changes of task->kind
> > are done while holding the task lock, omp_fulfill_event reads it before
> > locking that lock, so I think it needs to be done using
> > if (__atomic_load_n (&task->kind, MEMMODEL_RELAXED) == GOMP_TASK_UNDEFERRED)
> > Pedantically the stores to task->kind also need to be done
> > with __atomic_store_n MEMMODEL_RELAXED.
>
> If we check task->deferred_p instead (which never changes for a task after
> instantiation), is that still necessary?
Not for kind or the new field.
> > - in gomp_barrier_handle_tasks the reason for if (new_tasks > 1)
> > is that if there is a single dependent task, the current thread
> > just finished handling one task and so can take that single task and so no
> > need to wake up. While in the omp_fulfill_event case, even if there
> > is just one new task, we need to schedule it to some thread and so
> > is desirable to wake some thread.
>
> In that case, we could just do 'if (new_tasks > 0)' instead?
Yes.
>
> > All we know
> > (if team == gomp_thread ()->ts.team) is that at least one thread is doing
> > something else but that one could be busy for quite some time.
>
> Well, it should still get around to the new task eventually, so there is no
> problem in terms of correctness here. I suppose we could always wake up one
> more thread than strictly necessary, but that might have knock-on effects on
> performance elsewhere?
Yeah, waking something unnecessarily is always going to cause performance
problems.
> I have applied your patch to move the gomp_team_barrier_done, and in
> omp_fulfill_event, I ensure that a single thread is woken up so that
> gomp_barrier_handle_tasks can signal for the barrier to finish.
>
> I'm having some trouble coming up with a testcase to test this scenario
> though. I tried having a testcase like this to have threads in separate
> teams:
The unshackeled thread testcase would probably need a pthread_create
call and restricting the testcase to POSIX threads targets.
The teams in host teams (or target) don't have at least in OpenMP a way
to serialize, e.g. it can always be implemented like we do ATM.
But I guess that testcase can be done incrementally.
> @@ -545,8 +548,15 @@ struct gomp_task
> entries and the gomp_task in which they reside. */
> struct priority_node pnode[3];
>
> - bool detach;
> - gomp_sem_t completion_sem;
> + union {
> + /* Valid only if deferred_p is false. */
> + gomp_sem_t *completion_sem;
> + /* Valid only if deferred_p is true. Set to the team that executes the
> + task if the task is detached and the completion event has yet to be
> + fulfilled. */
> + struct gomp_team *detach_team;
> + };
> + bool deferred_p;
>
> struct gomp_task_icv icv;
> void (*fn) (void *);
What I don't like is that this creates too much wasteful padding
in a struct that should be as small as possible.
At least on 64-bit hosts which we care about most, pahole shows with your
patch:
struct gomp_task {
struct gomp_task * parent; /* 0 8 */
struct priority_queue children_queue; /* 8 32 */
struct gomp_taskgroup * taskgroup; /* 40 8 */
struct gomp_dependers_vec * dependers; /* 48 8 */
struct htab * depend_hash; /* 56 8 */
/* --- cacheline 1 boundary (64 bytes) --- */
struct gomp_taskwait * taskwait; /* 64 8 */
size_t depend_count; /* 72 8 */
size_t num_dependees; /* 80 8 */
int priority; /* 88 4 */
/* XXX 4 bytes hole, try to pack */
struct priority_node pnode[3]; /* 96 48 */
/* --- cacheline 2 boundary (128 bytes) was 16 bytes ago --- */
union {
gomp_sem_t * completion_sem; /* 144 8 */
struct gomp_team * detach_team; /* 144 8 */
}; /* 144 8 */
_Bool deferred_p; /* 152 1 */
/* XXX 7 bytes hole, try to pack */
struct gomp_task_icv icv; /* 160 40 */
/* --- cacheline 3 boundary (192 bytes) was 8 bytes ago --- */
void (*fn)(void *); /* 200 8 */
void * fn_data; /* 208 8 */
enum gomp_task_kind kind; /* 216 4 */
_Bool in_tied_task; /* 220 1 */
_Bool final_task; /* 221 1 */
_Bool copy_ctors_done; /* 222 1 */
_Bool parent_depends_on; /* 223 1 */
struct gomp_task_depend_entry depend[]; /* 224 0 */
/* size: 224, cachelines: 4, members: 21 */
/* sum members: 213, holes: 2, sum holes: 11 */
/* last cacheline: 32 bytes */
};
So perhaps it might be better to put the new 1 fields before int priority;
field, in order bool deferred_p; union { };
That way, there will be just 3 bytes hole in the whole struct,
not 4 + 7 byte holes.
>
> - if (task.detach && !task_fulfilled_p (&task))
> - gomp_sem_wait (&task.completion_sem);
> + if ((flags & GOMP_TASK_FLAG_DETACH) != 0 && detach)
> + gomp_sem_wait (&completion_sem);
I think gomp_sem_destroy is missing here (in the conditional if it was
only initialized. Furthermore, I don't understand the && detach,
the earlier code assumes that if (flags & GOMP_TASK_FLAG_DETACH) != 0
then it can dereference *(void *)) detach, so the && detach seems
to be unnecessary.
> @@ -484,15 +483,16 @@ GOMP_task (void (*fn) (void *), void *data, void (*cpyfn) (void *, void *),
> task->kind = GOMP_TASK_UNDEFERRED;
> task->in_tied_task = parent->in_tied_task;
> task->taskgroup = taskgroup;
> + task->deferred_p = true;
> if ((flags & GOMP_TASK_FLAG_DETACH) != 0)
> {
> - task->detach = true;
> - gomp_sem_init (&task->completion_sem, 0);
> - *(void **) detach = &task->completion_sem;
I think you can move task->deferred_p into the if stmt.
> + if (!shackled_thread_p
> + && !do_wake
> + && gomp_team_barrier_waiting_for_tasks (&team->barrier)
> + && team->task_detach_count == 0)
&& team->task_detach_count == 0 is cheaper than the
&& gomp_team_barrier_waiting_for_tasks (&team->barrier)
so please swap those two.
> + {
> + /* Ensure that at least one thread is woken up to signal that the
> + barrier can finish. */
> + do_wake = 1;
> + }
Please drop the {}s around the single do_wake = 1; stmt.
Otherwise LGTM.
Jakub
next prev parent reply other threads:[~2021-02-24 19:46 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-01-21 19:33 Kwok Cheung Yeung
2021-01-21 22:46 ` Kwok Cheung Yeung
2021-01-29 15:03 ` Jakub Jelinek
2021-02-12 14:36 ` H.J. Lu
2021-02-19 19:12 ` [WIP] " Kwok Cheung Yeung
2021-02-22 13:49 ` Jakub Jelinek
2021-02-22 18:14 ` Jakub Jelinek
2021-02-24 18:17 ` Kwok Cheung Yeung
2021-02-24 19:46 ` Jakub Jelinek [this message]
2021-02-25 16:21 ` Kwok Cheung Yeung
2021-02-25 16:38 ` Jakub Jelinek
2021-02-23 21:43 ` Kwok Cheung Yeung
2021-02-23 21:52 ` Jakub Jelinek
2021-03-11 16:52 ` Thomas Schwinge
2021-03-25 12:02 ` Thomas Schwinge
2021-03-26 14:42 ` [Patch] libgomp: Fix on_device_arch.c aux-file handling [PR99555] (was: [WIP] Re: [PATCH] openmp: Fix intermittent hanging of task-detach-6 libgomp tests [PR98738]) Tobias Burnus
2021-03-26 14:46 ` Jakub Jelinek
2021-03-26 15:19 ` Tobias Burnus
2021-03-26 15:22 ` Jakub Jelinek
2021-03-29 9:09 ` [Patch] libgomp: Fix on_device_arch.c aux-file handling [PR99555] Thomas Schwinge
2021-04-09 11:00 ` [WIP] Re: [PATCH] openmp: Fix intermittent hanging of task-detach-6 libgomp tests [PR98738] Thomas Schwinge
2021-04-15 9:19 ` Thomas Schwinge
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210224194625.GC4020736@tucnak \
--to=jakub@redhat.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=kcy@codesourcery.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).