From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-return-199271-listarch-gcc=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 113200 invoked by alias); 5 Jun 2019 18:52:53 -0000
Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc/>
List-Post: <mailto:gcc@gcc.gnu.org>
List-Help: <http://gcc.gnu.org/ml/>
Sender: gcc-owner@gcc.gnu.org
Received: (qmail 113192 invoked by uid 89); 5 Jun 2019 18:52:53 -0000
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: No, score=-0.3 required=5.0 tests=AWL,BAYES_00,BODY_8BITS,GARBLED_BODY,SPF_HELO_PASS autolearn=no version=3.3.1 spammy=H*i:sk:ab3513b, H*f:sk:ab3513b, HContent-Transfer-Encoding:8bit
X-HELO: mx1.redhat.com
Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 05 Jun 2019 18:52:52 +0000
Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22])	(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))	(No client certificate requested)	by mx1.redhat.com (Postfix) with ESMTPS id 56FCB30872CC;	Wed,  5 Jun 2019 18:52:46 +0000 (UTC)
Received: from tucnak.zalov.cz (ovpn-116-52.ams2.redhat.com [10.36.116.52])	by smtp.corp.redhat.com (Postfix) with ESMTPS id F39C710021B1;	Wed,  5 Jun 2019 18:52:44 +0000 (UTC)
Received: from tucnak.zalov.cz (localhost [127.0.0.1])	by tucnak.zalov.cz (8.15.2/8.15.2) with ESMTP id x55IqfhJ028642;	Wed, 5 Jun 2019 20:52:42 +0200
Received: (from jakub@localhost)	by tucnak.zalov.cz (8.15.2/8.15.2/Submit) id x55IqXJ1028641;	Wed, 5 Jun 2019 20:52:33 +0200
Date: Wed, 05 Jun 2019 18:52:00 -0000
From: Jakub Jelinek <jakub@redhat.com>
To: =?utf-8?B?6rmA6rec656Y?= <msca8h@naver.com>
Cc: gcc@gcc.gnu.org
Subject: Re: [GSoC'19, libgomp work-stealing] Task parallelism runtime
Message-ID: <20190605185233.GG19695@tucnak>
Reply-To: Jakub Jelinek <jakub@redhat.com>
References: <bd596217abbd78871156ebee87cb5b@cweb006.nm.nfra.io> <20190603182101.GS19695@tucnak> <ab3513bf786d30a4ad51bf31945828f4@cweb008.nm.nfra.io>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <ab3513bf786d30a4ad51bf31945828f4@cweb008.nm.nfra.io>
User-Agent: Mutt/1.11.3 (2019-02-01)
X-IsSubscribed: yes
X-SW-Source: 2019-06/txt/msg00028.txt.bz2

On Thu, Jun 06, 2019 at 03:25:24AM +0900, ê¹ê·ë wrote:
> Hi, thanks for the detailed explanation.
> I think I now get the picture.
> Judging from my current understanding, the task-parallelism currently works as follows: 
> 1. Tasks are placed in a global shared queue.

It isn't a global shared queue, but a per-team shared queue, in fact 3
different ones, guarded by the same per-team mutex team->task_lock though:
team->task_queue	used on barriers
task->children_queue	used for #pragma omp taskwait
taskgroup->taskgroup_queue	used at the end of #pragma omp taskgroup

> 2. Workers consume the tasks by bashing the queue in a while loop, just as self-scheduling (dynamic scheduling)/
>  
> Then the improvements including work-stealing must be done by:
> 1. Each worker holds a dedicated task queue reducing the resource contention.
> 2. The tasks are distributed in a round-robin fashion
> 3. work-stealing will resolve the load imbalance.
>  
> If the above statements are correct, I guess the task priority should be given some special treatment?

Yes, one thing to consider is task priority, another thing is what to do
on those #pragma omp taskwait or #pragma omp taskgroup waits where while one
can schedule most of the tasks (the OpenMP specification has restrictions on
what can be scheduled, but the restrictions are mostly relevant to untied
tasks which we don't support anyway (well, ignore untied and schedule them
like tied tasks)), but it might be more beneficial (and what is currently
implemented) to queue only the tasks that will help satisfying what we are
waiting on.  All of priority and only scheduling task or taskgroup children
instead of all tasks are just hints, the implementation may choose other
tasks.
There is another thing though, tasks with dependencies, where
gomp_task_handle_depend which needs to look up addresses in hash tables
for the dependencies also uses the team->task_lock mutex.

	Jakub