From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 2859 invoked by alias); 25 Sep 2009 02:33:14 -0000 Received: (qmail 2850 invoked by uid 22791); 25 Sep 2009 02:33:13 -0000 X-SWARE-Spam-Status: No, hits=-2.5 required=5.0 tests=AWL,BAYES_00,SPF_PASS X-Spam-Check-By: sourceware.org Received: from mail-ew0-f226.google.com (HELO mail-ew0-f226.google.com) (209.85.219.226) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Fri, 25 Sep 2009 02:33:09 +0000 Received: by ewy26 with SMTP id 26so2152753ewy.29 for ; Thu, 24 Sep 2009 19:33:06 -0700 (PDT) Received: by 10.210.6.8 with SMTP id 8mr8459683ebf.41.1253845986752; Thu, 24 Sep 2009 19:33:06 -0700 (PDT) Received: from ?192.168.2.99? (cpc2-cmbg8-0-0-cust61.cmbg.cable.ntl.com [82.6.108.62]) by mx.google.com with ESMTPS id 7sm936979eyb.44.2009.09.24.19.33.05 (version=SSLv3 cipher=RC4-MD5); Thu, 24 Sep 2009 19:33:06 -0700 (PDT) Message-ID: <4ABC2F40.7020905@gmail.com> Date: Fri, 25 Sep 2009 05:39:00 -0000 From: Dave Korn User-Agent: Thunderbird 2.0.0.17 (Windows/20080914) MIME-Version: 1.0 To: "gcc@gcc.gnu.org" Subject: Any tips for debugging a GNAT tasking implementation problem? Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org X-SW-Source: 2009-09/txt/msg00529.txt.bz2 Hi all, Over on the cygwin-improvements branch(*) I've got a fairly nifty fully POSIX-based port of Ada, but there's one FAIL on the gnat testsuite that I'm trying to debug. It could be a bug in the port, or the testcase might have stressed an underlying bug in Cygwin's pthread functions. I'm hoping to get some pointers to help me understand the architecture of the tasking control in GNAT. The failing case is gnat.dg/task_stack_align.adb, which fails like so: > $ ./task_stack_align.exe > > raised TASKING_ERROR : Failure during activation > > $ Debugging it suggests that the problem arises in Activate_Tasks (s-tassta.adb), here: > if Self_ID.Common.Activation_Failed then > Self_ID.Common.Activation_Failed := False; > raise Tasking_Error with "Failure during activation"; > end if; which I think is triggering as a consequence of this sequence in Vulnerable_Complete_Activation (also s-tassta.adb): > -- The activator raises a Tasking_Error if any task it is activating > -- is completed before the activation is done. However, if the reason > -- for the task completion is an abort, we do not raise an exception. > -- See RM 9.2(5). > > if not Self_ID.Callable and then Self_ID.Pending_ATC_Level /= 0 then > Activator.Common.Activation_Failed := True; > end if; If I take a look at the state of the tasks when the exception is raised, they claim to all have terminated: > Breakpoint 1, 0x004183ca in <__gnat_raise_exception> (e=0x42c38c, > message=0x4316e3) at a-exexda.adb:244 > 244 procedure Append_Info_Character > (gdb) call list_tasks > tasks(50): TERMINATED, parent: main_task, prio: 0, not callable, abort deferred > tasks(49): TERMINATED, parent: main_task, prio: 0, not callable, abort deferred > tasks(48): TERMINATED, parent: main_task, prio: 0, not callable, abort deferred [ ... snip similar entries ... ] > tasks(31): TERMINATED, parent: main_task, prio: 0, not callable, abort deferred > tasks(30): TERMINATED, parent: main_task, prio: 0, not callable, abort deferred > tasks(29): TERMINATED, parent: main_task, prio: 0, not callable, abort deferred > tasks(28): TERMINATED, parent: main_task, prio: 15, not callable, abort deferred [ I'm not sure if there's any significance in the way the priority fields change from 0 to 15 at this point yet. ] > tasks(27): TERMINATED, parent: main_task, prio: 15, not callable, abort deferred > tasks(26): TERMINATED, parent: main_task, prio: 15, not callable, abort deferred [ ... snip similar entries ... ] > tasks(4): TERMINATED, parent: main_task, prio: 15, not callable, abort deferred > tasks(3): TERMINATED, parent: main_task, prio: 15, not callable, abort deferred > tasks(2): TERMINATED, parent: main_task, prio: 15, not callable, abort deferred > tasks(1): TERMINATED, parent: main_task, prio: 15, not callable, abort deferred > main_task: RUNNABLE, parent: , prio: 15 > (gdb) call print_current_task > main_task: RUNNABLE, parent: , prio: 15 > (gdb) So, if I've understood what I'm seeing, there's this object called an activator, and it has a whole bunch of threads (ada tasks) that it wants to start up in parallel, but it doesn't want them to all just start running straight away; it wants them all to be created at once before any of them have a chance to finish their work. That makes me think that it must be trying to create them in some suspended state, or gate their progress past a mutex or semaphore of some kind, so that it can create them all and then wake them all at once when it's done. Is this right? If so, can anyone point me at the mechanism that is supposed to hold the threads back but appears to be failing in this case? If not, can someone tell me how the task activation is supposed to work in this test? cheers, DaveK