From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-return-156820-listarch-gcc=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 2859 invoked by alias); 25 Sep 2009 02:33:14 -0000
Received: (qmail 2850 invoked by uid 22791); 25 Sep 2009 02:33:13 -0000
X-SWARE-Spam-Status: No, hits=-2.5 required=5.0 	tests=AWL,BAYES_00,SPF_PASS
X-Spam-Check-By: sourceware.org
Received: from mail-ew0-f226.google.com (HELO mail-ew0-f226.google.com) (209.85.219.226)     by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Fri, 25 Sep 2009 02:33:09 +0000
Received: by ewy26 with SMTP id 26so2152753ewy.29         for <gcc@gcc.gnu.org>; Thu, 24 Sep 2009 19:33:06 -0700 (PDT)
Received: by 10.210.6.8 with SMTP id 8mr8459683ebf.41.1253845986752;         Thu, 24 Sep 2009 19:33:06 -0700 (PDT)
Received: from ?192.168.2.99? (cpc2-cmbg8-0-0-cust61.cmbg.cable.ntl.com [82.6.108.62])         by mx.google.com with ESMTPS id 7sm936979eyb.44.2009.09.24.19.33.05         (version=SSLv3 cipher=RC4-MD5);         Thu, 24 Sep 2009 19:33:06 -0700 (PDT)
Message-ID: <4ABC2F40.7020905@gmail.com>
Date: Fri, 25 Sep 2009 05:39:00 -0000
From: Dave Korn <dave.korn.cygwin@googlemail.com>
User-Agent: Thunderbird 2.0.0.17 (Windows/20080914)
MIME-Version: 1.0
To: "gcc@gcc.gnu.org" <gcc@gcc.gnu.org>
Subject: Any tips for debugging a GNAT tasking implementation problem?
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc/>
List-Post: <mailto:gcc@gcc.gnu.org>
List-Help: <http://gcc.gnu.org/ml/>
Sender: gcc-owner@gcc.gnu.org
X-SW-Source: 2009-09/txt/msg00529.txt.bz2


    Hi all,

  Over on the cygwin-improvements branch(*) I've got a fairly nifty fully
POSIX-based port of Ada, but there's one FAIL on the gnat testsuite that I'm
trying to debug.  It could be a bug in the port, or the testcase might have
stressed an underlying bug in Cygwin's pthread functions.  I'm hoping to get
some pointers to help me understand the architecture of the tasking control in
GNAT.

  The failing case is gnat.dg/task_stack_align.adb, which fails like so:

> $ ./task_stack_align.exe
> 
> raised TASKING_ERROR : Failure during activation
> 
> $

  Debugging it suggests that the problem arises in Activate_Tasks
(s-tassta.adb), here:

>       if Self_ID.Common.Activation_Failed then
>          Self_ID.Common.Activation_Failed := False;
>          raise Tasking_Error with "Failure during activation";
>       end if;

which I think is triggering as a consequence of this sequence in
Vulnerable_Complete_Activation (also s-tassta.adb):

>       --  The activator raises a Tasking_Error if any task it is activating
>       --  is completed before the activation is done. However, if the reason
>       --  for the task completion is an abort, we do not raise an exception.
>       --  See RM 9.2(5).
> 
>       if not Self_ID.Callable and then Self_ID.Pending_ATC_Level /= 0 then
>          Activator.Common.Activation_Failed := True;
>       end if;

  If I take a look at the state of the tasks when the exception is raised,
they claim to all have terminated:

> Breakpoint 1, 0x004183ca in <__gnat_raise_exception> (e=0x42c38c,
>     message=0x4316e3) at a-exexda.adb:244
> 244        procedure Append_Info_Character
> (gdb) call list_tasks
> tasks(50): TERMINATED, parent: main_task, prio: 0, not callable, abort deferred
> tasks(49): TERMINATED, parent: main_task, prio: 0, not callable, abort deferred
> tasks(48): TERMINATED, parent: main_task, prio: 0, not callable, abort deferred
    [ ...  snip similar entries  ... ]
> tasks(31): TERMINATED, parent: main_task, prio: 0, not callable, abort deferred
> tasks(30): TERMINATED, parent: main_task, prio: 0, not callable, abort deferred
> tasks(29): TERMINATED, parent: main_task, prio: 0, not callable, abort deferred
> tasks(28): TERMINATED, parent: main_task, prio: 15, not callable, abort deferred
[  I'm not sure if there's any significance in the way the priority fields
change from 0 to 15 at this point yet.  ]
> tasks(27): TERMINATED, parent: main_task, prio: 15, not callable, abort deferred
> tasks(26): TERMINATED, parent: main_task, prio: 15, not callable, abort deferred
    [ ...  snip similar entries  ... ]
> tasks(4): TERMINATED, parent: main_task, prio: 15, not callable, abort deferred
> tasks(3): TERMINATED, parent: main_task, prio: 15, not callable, abort deferred
> tasks(2): TERMINATED, parent: main_task, prio: 15, not callable, abort deferred
> tasks(1): TERMINATED, parent: main_task, prio: 15, not callable, abort deferred
> main_task: RUNNABLE, parent: <none>, prio: 15
> (gdb) call print_current_task
> main_task: RUNNABLE, parent: <none>, prio: 15
> (gdb)

  So, if I've understood what I'm seeing, there's this object called an
activator, and it has a whole bunch of threads (ada tasks) that it wants to
start up in parallel, but it doesn't want them to all just start running
straight away; it wants them all to be created at once before any of them have
a chance to finish their work.

  That makes me think that it must be trying to create them in some suspended
state, or gate their progress past a mutex or semaphore of some kind, so that
it can create them all and then wake them all at once when it's done.  Is this
right?  If so, can anyone point me at the mechanism that is supposed to hold
the threads back but appears to be failing in this case?  If not, can someone
tell me how the task activation is supposed to work in this test?

    cheers,
      DaveK