Re: [RFC PATCH 1/4] rseq: Add sched_state field to struct rseq

public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed

From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: Boqun Feng <boqun.feng@gmail.com>
Cc: "Peter Zijlstra" <peterz@infradead.org>,
	linux-kernel@vger.kernel.org,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Paul E . McKenney" <paulmck@kernel.org>,
	"H . Peter Anvin" <hpa@zytor.com>, "Paul Turner" <pjt@google.com>,
	linux-api@vger.kernel.org,
	"Christian Brauner" <brauner@kernel.org>,
	"Florian Weimer" <fw@deneb.enyo.de>,
	David.Laight@aculab.com, carlos@redhat.com,
	"Peter Oskolkov" <posk@posk.io>,
	"Alexander Mikhalitsyn" <alexander@mihalicyn.com>,
	"Chris Kennelly" <ckennelly@google.com>,
	"Ingo Molnar" <mingo@redhat.com>,
	"Darren Hart" <dvhart@infradead.org>,
	"Davidlohr Bueso" <dave@stgolabs.net>,
	"André Almeida" <andrealmeid@igalia.com>,
	libc-alpha@sourceware.org, "Steven Rostedt" <rostedt@goodmis.org>,
	"Jonathan Corbet" <corbet@lwn.net>,
	"Florian Weimer" <fweimer@redhat.com>
Subject: Re: [RFC PATCH 1/4] rseq: Add sched_state field to struct rseq
Date: Tue, 23 May 2023 10:10:40 -0400	[thread overview]
Message-ID: <18286958-df67-f5c8-157a-9b0e8764a299@efficios.com> (raw)
In-Reply-To: <ZGevZxOjJLMO9zlM@boqun-archlinux>

On 2023-05-19 13:18, Boqun Feng wrote:
[...]
> 
> The case in my mind is the opposite direction: the loads from other
> threads delay the stores to rseq_cs on the current thread, which I
> assume are usually a fast path. For example:

Yes, OK, you are correct. And I just validated on my end that busy-waiting
repeatedly loading from a cache line does slow down the concurrent stores
to other variables on that cache line significantly (at least on my
Intel(R) Core(TM) i7-8650U). Small reproducer provided at the end of
this email. Results:

compudj@thinkos:~/test$ time ./test-cacheline -d
thread id : 140242706274048, pid 16940
thread id : 140242697881344, pid 16940

real	0m4.145s
user	0m8.289s
sys	0m0.000s

compudj@thinkos:~/test$ time ./test-cacheline -s
thread id : 139741482387200, pid 16950
thread id : 139741473994496, pid 16950

real	0m4.573s
user	0m9.147s
sys	0m0.000s


> 
> 	CPU 1				CPU 2
> 
> 	lock(foo); // holding a lock
> 	rseq_start():
> 	  <CPU 1 own the cache line exclusively>
> 	  				lock(foo):
> 					  <fail to get foo>
> 					  <check whether the lock owner is on CPU>
> 					  <cache line becames shared>
> 	  ->rseq_cs = .. // Need to invalidate the cache line on other CPU
> 
> But as you mentioned, there is only one updater here (the current
> thread), so maybe it doesn't matter... but since it's a userspace ABI,
> so I cannot help thinking "what if there is another bit that has a
> different usage pattern introduced in the future", so..

Yes, however we have to be careful about how we introduce this considering
that the rseq feature extensions are "append only" to the structure feature
size exported by the kernel to userspace through getauxval(3).

So if we decide that we create a big hole right in the middle of the rseq_abi
for cacheline alignment, that's a possibility, but we'd really be wasting an
entire cacheline for a single bit.

Another possibility would be to add a level of indirection: we could have a field
in struct rseq which is either a pointer or offset from the thread_pointer() to
the on-cpu bit, which would sit in a different cache line. It would be up to
glibc to allocate space for it, possibly at the end of the rseq_abi field.

> 
>> Note that the heavy cache-line bouncing in my test-case happens on the lock
>> structure (cmpxchg expecting NULL, setting the current thread rseq_get_abi()
>> pointer on success). There are probably better ways to implement that part,
>> it is currently just a simple prototype showcasing the approach.
>>
> 
> Yeah.. that's a little strange, I guess you can just read the lock
> owner's rseq_abi, for example:
> 
> 	rseq_lock_slowpath() {
> 		struct rseq_abi *other_rseq = lock->owner;
> 
> 		if (RSEQ_ACCESS_ONCE(other_rseq->sched_state)) {
> 			...
> 		}
> 	}

Yes, I don't think the load of the owner pointer needs to be part of the
cmpxchg per se. It could be done from a load on the slow-path.

This way we would not require that the owner id and the lock state be the
same content, and this would allow much more freedom for the fast-path
semantic.

Thanks,

Mathieu

> 
> ?
> 
> Regards,
> Boqun
> 
>> Thanks,
>>
>> Mathieu
>>
>> -- 
>> Mathieu Desnoyers
>> EfficiOS Inc.
>> https://www.efficios.com
>>

Reproducer:

/*
  * cacheline testing (exclusive vs shared store speed)
  *
  * build with gcc -O2 -pthread -o test-cacheline test-cacheline.c
  *
  * Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
  * License: MIT
  */

#include <stdio.h>
#include <pthread.h>
#include <stdlib.h>
#include <sys/types.h>
#include <unistd.h>
#include <rseq/rseq.h>

#define NR_THREADS 2

struct test {
	int a;
	int b;
} __attribute__((aligned(256)));

enum testcase {
	TEST_SAME_CACHELINE,
	TEST_OTHER_CACHELINE,
};

static enum testcase testcase;
static int test_stop, test_go;
static struct test test, test2;

static
void *testthread(void *arg)
{
	long nr = (long)arg;

         printf("thread id : %lu, pid %lu\n", pthread_self(), getpid());

	__atomic_add_fetch(&test_go, 1, __ATOMIC_RELAXED);
	while (RSEQ_READ_ONCE(test_go) < NR_THREADS)
		rseq_barrier();
	if (nr == 0) {
		switch (testcase) {
		case TEST_SAME_CACHELINE:
			while (!RSEQ_READ_ONCE(test_stop))
				(void) RSEQ_READ_ONCE(test.a);
			break;
		case TEST_OTHER_CACHELINE:
			while (!RSEQ_READ_ONCE(test_stop))
				(void) RSEQ_READ_ONCE(test2.a);
			break;
		}
	} else if (nr == 1) {
		unsigned long long i;

		for (i = 0; i < 16000000000UL; i++)
			RSEQ_WRITE_ONCE(test.b, i);
		RSEQ_WRITE_ONCE(test_stop, 1);
	}
         return ((void*)0);
}

static
void show_usage(char **argv)
{
	fprintf(stderr, "Usage: %s <OPTIONS>\n", argv[0]);
	fprintf(stderr, "OPTIONS:\n");
	fprintf(stderr, "	[-s] Same cacheline\n");
	fprintf(stderr, "	[-d] Different cacheline\n");
}

static
int parse_args(int argc, char **argv)
{
	if (argc != 2 || argv[1][0] != '-') {
		show_usage(argv);
		return -1;
	}
	switch (argv[1][1]) {
	case 's':
		testcase = TEST_SAME_CACHELINE;
		break;
	case 'd':
		testcase = TEST_OTHER_CACHELINE;
		break;
	default:
		show_usage(argv);
		return -1;
	}
	return 0;
}

int main(int argc, char **argv)
{
         pthread_t testid[NR_THREADS];
         void *tret;
         int i, err;

	if (parse_args(argc, argv))
		exit(1);

         for (i = 0; i < NR_THREADS; i++) {
                 err = pthread_create(&testid[i], NULL, testthread,
                         (void *)(long)i);
                 if (err != 0)
                         exit(1);
         }

         for (i = 0; i < NR_THREADS; i++) {
                 err = pthread_join(testid[i], &tret);
                 if (err != 0)
                         exit(1);
         }

         return 0;
}



-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

next prev parent reply	other threads:[~2023-05-23 14:10 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-17 15:26 [RFC PATCH 0/4] Extend rseq with sched_state field Mathieu Desnoyers
2023-05-17 15:26 ` [RFC PATCH 1/4] rseq: Add sched_state field to struct rseq Mathieu Desnoyers
2023-05-17 16:03   ` Davidlohr Bueso
2023-05-18 21:49   ` Boqun Feng
2023-05-19 14:15     ` Mathieu Desnoyers
2023-05-19 17:18       ` Boqun Feng
2023-05-23 14:10         ` Mathieu Desnoyers [this message]
2023-05-19 20:51   ` Noah Goldstein
2023-05-23 12:49     ` Mathieu Desnoyers
2023-05-23 16:32       ` Noah Goldstein
2023-05-23 17:30         ` Mathieu Desnoyers
2023-05-23 20:10           ` Noah Goldstein
2023-05-17 15:26 ` [RFC PATCH 2/4] selftests/rseq: Add sched_state rseq field and getter Mathieu Desnoyers
2023-05-28 14:04   ` kernel test robot
2023-05-17 15:26 ` [RFC PATCH 3/4] selftests/rseq: Implement sched state test program Mathieu Desnoyers
2023-05-17 15:26 ` [RFC PATCH 4/4] selftests/rseq: Implement rseq_mutex " Mathieu Desnoyers
2023-05-17 16:07 ` [RFC PATCH 0/4] Extend rseq with sched_state field Davidlohr Bueso
2023-05-17 18:36 ` Steven Rostedt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=18286958-df67-f5c8-157a-9b0e8764a299@efficios.com \
    --to=mathieu.desnoyers@efficios.com \
    --cc=David.Laight@aculab.com \
    --cc=alexander@mihalicyn.com \
    --cc=andrealmeid@igalia.com \
    --cc=boqun.feng@gmail.com \
    --cc=brauner@kernel.org \
    --cc=carlos@redhat.com \
    --cc=ckennelly@google.com \
    --cc=corbet@lwn.net \
    --cc=dave@stgolabs.net \
    --cc=dvhart@infradead.org \
    --cc=fw@deneb.enyo.de \
    --cc=fweimer@redhat.com \
    --cc=hpa@zytor.com \
    --cc=libc-alpha@sourceware.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=posk@posk.io \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).