From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <libc-alpha-return-74873-listarch-libc-alpha=sources.redhat.com@sourceware.org>
Received: (qmail 72098 invoked by alias); 17 Nov 2016 14:47:06 -0000
Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <libc-alpha.sourceware.org>
List-Subscribe: <mailto:libc-alpha-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/libc-alpha/>
List-Post: <mailto:libc-alpha@sourceware.org>
List-Help: <mailto:libc-alpha-help@sourceware.org>, <http://sourceware.org/ml/#faqs>
Sender: libc-alpha-owner@sourceware.org
Received: (qmail 72030 invoked by uid 89); 17 Nov 2016 14:47:06 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-2.1 required=5.0 tests=BAYES_50,RP_MATCHES_RCVD,SPF_HELO_PASS autolearn=ham version=3.3.2 spammy=UD:The, commitdiff, glibcgit, glibc.git
X-HELO: mx1.redhat.com
Message-ID: <1479394010.7146.1154.camel@localhost.localdomain>
Subject: Re: BZ 20822 :powerpc: race condition in __lll_unlock_elision
From: Torvald Riegel <triegel@redhat.com>
To: Rajalakshmi Srinivasaraghavan <raji@linux.vnet.ibm.com>
Cc: libc-alpha@sourceware.org, aaron Sawdey <acsawdey@linux.vnet.ibm.com>,
        Ulrich Weigand <Ulrich.Weigand@de.ibm.com>,
        Steve Munroe
 <sjmunroe@us.ibm.co>, carlos@redhat.com,
        adhemerval.zanella@linaro.org, adconrad@ubuntu.com,
        wschmidt@linux.vnet.ibm.com
Date: Thu, 17 Nov 2016 14:47:00 -0000
In-Reply-To: <f777eebe-4a7b-87d2-1281-ab605c7fce8b@linux.vnet.ibm.com>
References: <f777eebe-4a7b-87d2-1281-ab605c7fce8b@linux.vnet.ibm.com>
Content-Type: text/plain; charset="UTF-8"
Mime-Version: 1.0
Content-Transfer-Encoding: 7bit
X-SW-Source: 2016-11/txt/msg00626.txt.bz2

On Tue, 2016-11-15 at 21:42 +0530, Rajalakshmi Srinivasaraghavan wrote:
> The initial problem reported was memory corruption in MongoDB only on 
> Ubuntu 16.04 ppc64le(not on Ubuntu 15).
> The problem was not easily recreatable and debugging with many tools and 
> creative ideas, the problem is narrowed down to lock elision in glibc.
> Ubuntu 16.04 has glibc 2.23 with lock elision enabled by default which 
> made the testcase fail only on 16.04.
> 
> As stated in BZ 20822, short description is
> 
> "The update of *adapt_count after the release of the lock causes a race 
> condition. Thread A unlocks, thread B continues and returns, destroying 
> mutex on stack,
>   then gets into another function, thread A writes to *adapt_count and 
> corrupts stack.The window is very narrow and requires that the
> machine be in smt mode, so likely the two threads have to be on the same 
> core in order to hit this"
> 
> Looking back the file changes in 
> sysdeps/unix/sysv/linux/powerpc/elision-unlock.c, suspecting the commit
>   https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=fadd2ad9cc36115440d50b0eae9299e65988917d which is introduced to improve
> the performance.
> 
> Thinking of the following fixes.
> 
> 1) Update of adapt_count before the unlock. But I have still not 
> identified if its going to affect the performance.

Because of the requirements on mutex destruction (and same for rwlock
etc.), a thread must not access any data associated with the mutex after
having unlocked it (this includes reads because it's correct to also
unmap the memory holding the mutex after destroying the mutex). 

> 2) In elision-lock.c/elision-trylock.c update adapt_count right after 
> the lock acquire at the end of the function.
> In either case we have to be careful not to introduce another race 
> condition.

Every access to mutex data outside of transactions must use atomic
operations if there would be data races otherwise (according to the C11
memory model).  I know that this isn't followed in the current mutex
code, but it really should be, and your change qualifies as new code so
please take care of this too.

I also think that if we can have concurrent accesses to a particular
variable both inside and outside of transactions, we should use atomic
accesses inside the transactions too, just for consistency.