From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 23531 invoked by alias); 7 Nov 2007 15:57:09 -0000 Received: (qmail 23516 invoked by uid 9453); 7 Nov 2007 15:57:09 -0000 Date: Wed, 07 Nov 2007 15:57:00 -0000 Message-ID: <20071107155709.23515.qmail@sourceware.org> From: teigland@sourceware.org To: cluster-cvs@sources.redhat.com Subject: cluster/dlm-kernel/src lockqueue.c Mailing-List: contact cluster-cvs-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: cluster-cvs-owner@sourceware.org X-SW-Source: 2007-q4/txt/msg00169.txt.bz2 CVSROOT: /cvs/cluster Module name: cluster Branch: RHEL46 Changes by: teigland@sourceware.org 2007-11-07 15:57:09 Modified files: dlm-kernel/src : lockqueue.c Log message: bz 349001 For the entire life of the dlm, there's been an annoying issue that we've worked around and not "fixed" directly. It's the source of all these messages: process_lockqueue_reply id 2c0224 state 0 The problem that a lock master sends an async "granted" message for a convert request *before* actually sending the reply for the original convert. The work-around is that the requesting node just takes the granted message as an implicit reply to the conversion and ignores the convert reply when it arrives later (the message above is printed when it gets the out-of-order reply for its convert). Apart from the annoying messages, it's never been a problem. Now we've found a case where it's a real problem: 1. nodeA: send convert PR->CW to nodeB nodeB: send granted message to nodeA nodeB: send convert reply to nodeA 2. nodeA: receive granted message for conversion complete request, sending ast to gfs 3. nodeA: send convert CW->EX to nodeB 4. nodeA: receive reply for convert in step 1, which we ordinarily ignore, but since another convert has been sent, we mistake this message as the reply for the convert in step 3, and complete the convert request which is *not* really completed yet 5. nodeA: send unlock to nodeB nodeB: complains about an unlock during a conversion The fix is to have nodeB not send a convert reply if it has already sent a granted message. (We already do this for cases where the conversion is granted when first processing it, but we don't in cases where the grant is done after processing the convert.) Patches: http://sourceware.org/cgi-bin/cvsweb.cgi/cluster/dlm-kernel/src/lockqueue.c.diff?cvsroot=cluster&only_with_tag=RHEL46&r1=1.37.2.9&r2=1.37.2.9.6.1