public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Paul E . McKenney" <paulmck@kernel.org>,
	"Boqun Feng" <boqun.feng@gmail.com>,
	"H . Peter Anvin" <hpa@zytor.com>, "Paul Turner" <pjt@google.com>,
	linux-api@vger.kernel.org,
	"Christian Brauner" <brauner@kernel.org>,
	"Florian Weimer" <fw@deneb.enyo.de>,
	David.Laight@ACULAB.COM, carlos@redhat.com,
	"Peter Oskolkov" <posk@posk.io>,
	"Alexander Mikhalitsyn" <alexander@mihalicyn.com>,
	"Chris Kennelly" <ckennelly@google.com>,
	"Ingo Molnar" <mingo@redhat.com>,
	"Darren Hart" <dvhart@infradead.org>,
	"Davidlohr Bueso" <dave@stgolabs.net>,
	"André Almeida" <andrealmeid@igalia.com>,
	libc-alpha@sourceware.org, "Steven Rostedt" <rostedt@goodmis.org>,
	"Jonathan Corbet" <corbet@lwn.net>,
	"Mathieu Desnoyers" <mathieu.desnoyers@efficios.com>,
	"Florian Weimer" <fweimer@redhat.com>
Subject: [RFC PATCH 1/4] rseq: Add sched_state field to struct rseq
Date: Wed, 17 May 2023 11:26:51 -0400	[thread overview]
Message-ID: <20230517152654.7193-2-mathieu.desnoyers@efficios.com> (raw)
In-Reply-To: <20230517152654.7193-1-mathieu.desnoyers@efficios.com>

Expose the "on-cpu" state for each thread through struct rseq to allow
adaptative mutexes to decide more accurately between busy-waiting and
calling sys_futex() to release the CPU, based on the on-cpu state of the
mutex owner.

It is only provided as an optimization hint, because there is no
guarantee that the page containing this field is in the page cache, and
therefore the scheduler may very well fail to clear the on-cpu state on
preemption. This is expected to be rare though, and is resolved as soon
as the task returns to user-space.

The goal is to improve use-cases where the duration of the critical
sections for a given lock follows a multi-modal distribution, preventing
statistical guesses from doing a good job at choosing between busy-wait
and futex wait behavior.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
Cc: Carlos O'Donell <carlos@redhat.com>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: libc-alpha@sourceware.org
---
 include/linux/sched.h     | 12 ++++++++++++
 include/uapi/linux/rseq.h | 17 +++++++++++++++++
 kernel/rseq.c             | 14 ++++++++++++++
 3 files changed, 43 insertions(+)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index eed5d65b8d1f..c7e9248134c1 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2351,11 +2351,20 @@ static inline void rseq_signal_deliver(struct ksignal *ksig,
 	rseq_handle_notify_resume(ksig, regs);
 }
 
+void __rseq_set_sched_state(struct task_struct *t, unsigned int state);
+
+static inline void rseq_set_sched_state(struct task_struct *t, unsigned int state)
+{
+	if (t->rseq)
+		__rseq_set_sched_state(t, state);
+}
+
 /* rseq_preempt() requires preemption to be disabled. */
 static inline void rseq_preempt(struct task_struct *t)
 {
 	__set_bit(RSEQ_EVENT_PREEMPT_BIT, &t->rseq_event_mask);
 	rseq_set_notify_resume(t);
+	rseq_set_sched_state(t, 0);
 }
 
 /* rseq_migrate() requires preemption to be disabled. */
@@ -2405,6 +2414,9 @@ static inline void rseq_signal_deliver(struct ksignal *ksig,
 				       struct pt_regs *regs)
 {
 }
+static inline void rseq_set_sched_state(struct task_struct *t, unsigned int state)
+{
+}
 static inline void rseq_preempt(struct task_struct *t)
 {
 }
diff --git a/include/uapi/linux/rseq.h b/include/uapi/linux/rseq.h
index c233aae5eac9..c6d8537e23ca 100644
--- a/include/uapi/linux/rseq.h
+++ b/include/uapi/linux/rseq.h
@@ -37,6 +37,13 @@ enum rseq_cs_flags {
 		(1U << RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE_BIT),
 };
 
+enum rseq_sched_state {
+	/*
+	 * Task is currently running on a CPU if bit is set.
+	 */
+	RSEQ_SCHED_STATE_ON_CPU		= (1U << 0),
+};
+
 /*
  * struct rseq_cs is aligned on 4 * 8 bytes to ensure it is always
  * contained within a single cache-line. It is usually declared as
@@ -148,6 +155,16 @@ struct rseq {
 	 */
 	__u32 mm_cid;
 
+	/*
+	 * Restartable sequences sched_state field. Updated by the kernel. Read
+	 * by user-space with single-copy atomicity semantics. This fields can
+	 * be read by any userspace thread. Aligned on 32-bit. Contains a
+	 * bitmask of enum rseq_sched_state. This field is provided as a hint
+	 * by the scheduler, and requires that the page holding struct rseq is
+	 * faulted-in for the state update to be performed by the scheduler.
+	 */
+	__u32 sched_state;
+
 	/*
 	 * Flexible array member at end of structure, after last feature field.
 	 */
diff --git a/kernel/rseq.c b/kernel/rseq.c
index 9de6e35fe679..b2eb3bbaa9ef 100644
--- a/kernel/rseq.c
+++ b/kernel/rseq.c
@@ -91,6 +91,7 @@ static int rseq_update_cpu_node_id(struct task_struct *t)
 	u32 cpu_id = raw_smp_processor_id();
 	u32 node_id = cpu_to_node(cpu_id);
 	u32 mm_cid = task_mm_cid(t);
+	u32 sched_state = RSEQ_SCHED_STATE_ON_CPU;
 
 	WARN_ON_ONCE((int) mm_cid < 0);
 	if (!user_write_access_begin(rseq, t->rseq_len))
@@ -99,6 +100,7 @@ static int rseq_update_cpu_node_id(struct task_struct *t)
 	unsafe_put_user(cpu_id, &rseq->cpu_id, efault_end);
 	unsafe_put_user(node_id, &rseq->node_id, efault_end);
 	unsafe_put_user(mm_cid, &rseq->mm_cid, efault_end);
+	unsafe_put_user(sched_state, &rseq->sched_state, efault_end);
 	/*
 	 * Additional feature fields added after ORIG_RSEQ_SIZE
 	 * need to be conditionally updated only if
@@ -339,6 +341,18 @@ void __rseq_handle_notify_resume(struct ksignal *ksig, struct pt_regs *regs)
 	force_sigsegv(sig);
 }
 
+/*
+ * Attempt to update rseq scheduler state.
+ */
+void __rseq_set_sched_state(struct task_struct *t, unsigned int state)
+{
+	if (unlikely(t->flags & PF_EXITING))
+		return;
+	pagefault_disable();
+	(void) put_user(state, &t->rseq->sched_state);
+	pagefault_enable();
+}
+
 #ifdef CONFIG_DEBUG_RSEQ
 
 /*
-- 
2.25.1


  reply	other threads:[~2023-05-17 15:27 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-17 15:26 [RFC PATCH 0/4] Extend rseq with sched_state field Mathieu Desnoyers
2023-05-17 15:26 ` Mathieu Desnoyers [this message]
2023-05-17 16:03   ` [RFC PATCH 1/4] rseq: Add sched_state field to struct rseq Davidlohr Bueso
2023-05-18 21:49   ` Boqun Feng
2023-05-19 14:15     ` Mathieu Desnoyers
2023-05-19 17:18       ` Boqun Feng
2023-05-23 14:10         ` Mathieu Desnoyers
2023-05-19 20:51   ` Noah Goldstein
2023-05-23 12:49     ` Mathieu Desnoyers
2023-05-23 16:32       ` Noah Goldstein
2023-05-23 17:30         ` Mathieu Desnoyers
2023-05-23 20:10           ` Noah Goldstein
2023-05-17 15:26 ` [RFC PATCH 2/4] selftests/rseq: Add sched_state rseq field and getter Mathieu Desnoyers
2023-05-28 14:04   ` kernel test robot
2023-05-17 15:26 ` [RFC PATCH 3/4] selftests/rseq: Implement sched state test program Mathieu Desnoyers
2023-05-17 15:26 ` [RFC PATCH 4/4] selftests/rseq: Implement rseq_mutex " Mathieu Desnoyers
2023-05-17 16:07 ` [RFC PATCH 0/4] Extend rseq with sched_state field Davidlohr Bueso
2023-05-17 18:36 ` Steven Rostedt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230517152654.7193-2-mathieu.desnoyers@efficios.com \
    --to=mathieu.desnoyers@efficios.com \
    --cc=David.Laight@ACULAB.COM \
    --cc=alexander@mihalicyn.com \
    --cc=andrealmeid@igalia.com \
    --cc=boqun.feng@gmail.com \
    --cc=brauner@kernel.org \
    --cc=carlos@redhat.com \
    --cc=ckennelly@google.com \
    --cc=corbet@lwn.net \
    --cc=dave@stgolabs.net \
    --cc=dvhart@infradead.org \
    --cc=fw@deneb.enyo.de \
    --cc=fweimer@redhat.com \
    --cc=hpa@zytor.com \
    --cc=libc-alpha@sourceware.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=posk@posk.io \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).