|
|
|
|
@@ -23,6 +23,14 @@ over a rather long period of time, but improvements are always welcome!
|
|
|
|
|
Yet another exception is where the low real-time latency of RCU's
|
|
|
|
|
read-side primitives is critically important.
|
|
|
|
|
|
|
|
|
|
One final exception is where RCU readers are used to prevent
|
|
|
|
|
the ABA problem (https://en.wikipedia.org/wiki/ABA_problem)
|
|
|
|
|
for lockless updates. This does result in the mildly
|
|
|
|
|
counter-intuitive situation where rcu_read_lock() and
|
|
|
|
|
rcu_read_unlock() are used to protect updates, however, this
|
|
|
|
|
approach provides the same potential simplifications that garbage
|
|
|
|
|
collectors do.
|
|
|
|
|
|
|
|
|
|
1. Does the update code have proper mutual exclusion?
|
|
|
|
|
|
|
|
|
|
RCU does allow -readers- to run (almost) naked, but -writers- must
|
|
|
|
|
@@ -40,7 +48,9 @@ over a rather long period of time, but improvements are always welcome!
|
|
|
|
|
explain how this single task does not become a major bottleneck on
|
|
|
|
|
big multiprocessor machines (for example, if the task is updating
|
|
|
|
|
information relating to itself that other tasks can read, there
|
|
|
|
|
by definition can be no bottleneck).
|
|
|
|
|
by definition can be no bottleneck). Note that the definition
|
|
|
|
|
of "large" has changed significantly: Eight CPUs was "large"
|
|
|
|
|
in the year 2000, but a hundred CPUs was unremarkable in 2017.
|
|
|
|
|
|
|
|
|
|
2. Do the RCU read-side critical sections make proper use of
|
|
|
|
|
rcu_read_lock() and friends? These primitives are needed
|
|
|
|
|
@@ -55,6 +65,12 @@ over a rather long period of time, but improvements are always welcome!
|
|
|
|
|
Disabling of preemption can serve as rcu_read_lock_sched(), but
|
|
|
|
|
is less readable.
|
|
|
|
|
|
|
|
|
|
Letting RCU-protected pointers "leak" out of an RCU read-side
|
|
|
|
|
critical section is every bid as bad as letting them leak out
|
|
|
|
|
from under a lock. Unless, of course, you have arranged some
|
|
|
|
|
other means of protection, such as a lock or a reference count
|
|
|
|
|
-before- letting them out of the RCU read-side critical section.
|
|
|
|
|
|
|
|
|
|
3. Does the update code tolerate concurrent accesses?
|
|
|
|
|
|
|
|
|
|
The whole point of RCU is to permit readers to run without
|
|
|
|
|
@@ -78,10 +94,10 @@ over a rather long period of time, but improvements are always welcome!
|
|
|
|
|
|
|
|
|
|
This works quite well, also.
|
|
|
|
|
|
|
|
|
|
c. Make updates appear atomic to readers. For example,
|
|
|
|
|
c. Make updates appear atomic to readers. For example,
|
|
|
|
|
pointer updates to properly aligned fields will
|
|
|
|
|
appear atomic, as will individual atomic primitives.
|
|
|
|
|
Sequences of perations performed under a lock will -not-
|
|
|
|
|
Sequences of operations performed under a lock will -not-
|
|
|
|
|
appear to be atomic to RCU readers, nor will sequences
|
|
|
|
|
of multiple atomic primitives.
|
|
|
|
|
|
|
|
|
|
@@ -168,8 +184,8 @@ over a rather long period of time, but improvements are always welcome!
|
|
|
|
|
|
|
|
|
|
5. If call_rcu(), or a related primitive such as call_rcu_bh(),
|
|
|
|
|
call_rcu_sched(), or call_srcu() is used, the callback function
|
|
|
|
|
must be written to be called from softirq context. In particular,
|
|
|
|
|
it cannot block.
|
|
|
|
|
will be called from softirq context. In particular, it cannot
|
|
|
|
|
block.
|
|
|
|
|
|
|
|
|
|
6. Since synchronize_rcu() can block, it cannot be called from
|
|
|
|
|
any sort of irq context. The same rule applies for
|
|
|
|
|
@@ -178,11 +194,14 @@ over a rather long period of time, but improvements are always welcome!
|
|
|
|
|
synchronize_sched_expedite(), and synchronize_srcu_expedited().
|
|
|
|
|
|
|
|
|
|
The expedited forms of these primitives have the same semantics
|
|
|
|
|
as the non-expedited forms, but expediting is both expensive
|
|
|
|
|
and unfriendly to real-time workloads. Use of the expedited
|
|
|
|
|
primitives should be restricted to rare configuration-change
|
|
|
|
|
operations that would not normally be undertaken while a real-time
|
|
|
|
|
workload is running.
|
|
|
|
|
as the non-expedited forms, but expediting is both expensive and
|
|
|
|
|
(with the exception of synchronize_srcu_expedited()) unfriendly
|
|
|
|
|
to real-time workloads. Use of the expedited primitives should
|
|
|
|
|
be restricted to rare configuration-change operations that would
|
|
|
|
|
not normally be undertaken while a real-time workload is running.
|
|
|
|
|
However, real-time workloads can use rcupdate.rcu_normal kernel
|
|
|
|
|
boot parameter to completely disable expedited grace periods,
|
|
|
|
|
though this might have performance implications.
|
|
|
|
|
|
|
|
|
|
In particular, if you find yourself invoking one of the expedited
|
|
|
|
|
primitives repeatedly in a loop, please do everyone a favor:
|
|
|
|
|
@@ -193,11 +212,6 @@ over a rather long period of time, but improvements are always welcome!
|
|
|
|
|
of the system, especially to real-time workloads running on
|
|
|
|
|
the rest of the system.
|
|
|
|
|
|
|
|
|
|
In addition, it is illegal to call the expedited forms from
|
|
|
|
|
a CPU-hotplug notifier, or while holding a lock that is acquired
|
|
|
|
|
by a CPU-hotplug notifier. Failing to observe this restriction
|
|
|
|
|
will result in deadlock.
|
|
|
|
|
|
|
|
|
|
7. If the updater uses call_rcu() or synchronize_rcu(), then the
|
|
|
|
|
corresponding readers must use rcu_read_lock() and
|
|
|
|
|
rcu_read_unlock(). If the updater uses call_rcu_bh() or
|
|
|
|
|
@@ -321,7 +335,7 @@ over a rather long period of time, but improvements are always welcome!
|
|
|
|
|
Similarly, disabling preemption is not an acceptable substitute
|
|
|
|
|
for rcu_read_lock(). Code that attempts to use preemption
|
|
|
|
|
disabling where it should be using rcu_read_lock() will break
|
|
|
|
|
in real-time kernel builds.
|
|
|
|
|
in CONFIG_PREEMPT=y kernel builds.
|
|
|
|
|
|
|
|
|
|
If you want to wait for interrupt handlers, NMI handlers, and
|
|
|
|
|
code under the influence of preempt_disable(), you instead
|
|
|
|
|
@@ -356,23 +370,22 @@ over a rather long period of time, but improvements are always welcome!
|
|
|
|
|
not the case, a self-spawning RCU callback would prevent the
|
|
|
|
|
victim CPU from ever going offline.)
|
|
|
|
|
|
|
|
|
|
14. SRCU (srcu_read_lock(), srcu_read_unlock(), srcu_dereference(),
|
|
|
|
|
synchronize_srcu(), synchronize_srcu_expedited(), and call_srcu())
|
|
|
|
|
may only be invoked from process context. Unlike other forms of
|
|
|
|
|
RCU, it -is- permissible to block in an SRCU read-side critical
|
|
|
|
|
section (demarked by srcu_read_lock() and srcu_read_unlock()),
|
|
|
|
|
hence the "SRCU": "sleepable RCU". Please note that if you
|
|
|
|
|
don't need to sleep in read-side critical sections, you should be
|
|
|
|
|
using RCU rather than SRCU, because RCU is almost always faster
|
|
|
|
|
and easier to use than is SRCU.
|
|
|
|
|
14. Unlike other forms of RCU, it -is- permissible to block in an
|
|
|
|
|
SRCU read-side critical section (demarked by srcu_read_lock()
|
|
|
|
|
and srcu_read_unlock()), hence the "SRCU": "sleepable RCU".
|
|
|
|
|
Please note that if you don't need to sleep in read-side critical
|
|
|
|
|
sections, you should be using RCU rather than SRCU, because RCU
|
|
|
|
|
is almost always faster and easier to use than is SRCU.
|
|
|
|
|
|
|
|
|
|
Also unlike other forms of RCU, explicit initialization
|
|
|
|
|
and cleanup is required via init_srcu_struct() and
|
|
|
|
|
cleanup_srcu_struct(). These are passed a "struct srcu_struct"
|
|
|
|
|
that defines the scope of a given SRCU domain. Once initialized,
|
|
|
|
|
the srcu_struct is passed to srcu_read_lock(), srcu_read_unlock()
|
|
|
|
|
synchronize_srcu(), synchronize_srcu_expedited(), and call_srcu().
|
|
|
|
|
A given synchronize_srcu() waits only for SRCU read-side critical
|
|
|
|
|
Also unlike other forms of RCU, explicit initialization and
|
|
|
|
|
cleanup is required either at build time via DEFINE_SRCU()
|
|
|
|
|
or DEFINE_STATIC_SRCU() or at runtime via init_srcu_struct()
|
|
|
|
|
and cleanup_srcu_struct(). These last two are passed a
|
|
|
|
|
"struct srcu_struct" that defines the scope of a given
|
|
|
|
|
SRCU domain. Once initialized, the srcu_struct is passed
|
|
|
|
|
to srcu_read_lock(), srcu_read_unlock() synchronize_srcu(),
|
|
|
|
|
synchronize_srcu_expedited(), and call_srcu(). A given
|
|
|
|
|
synchronize_srcu() waits only for SRCU read-side critical
|
|
|
|
|
sections governed by srcu_read_lock() and srcu_read_unlock()
|
|
|
|
|
calls that have been passed the same srcu_struct. This property
|
|
|
|
|
is what makes sleeping read-side critical sections tolerable --
|
|
|
|
|
@@ -390,10 +403,16 @@ over a rather long period of time, but improvements are always welcome!
|
|
|
|
|
Therefore, SRCU should be used in preference to rw_semaphore
|
|
|
|
|
only in extremely read-intensive situations, or in situations
|
|
|
|
|
requiring SRCU's read-side deadlock immunity or low read-side
|
|
|
|
|
realtime latency.
|
|
|
|
|
realtime latency. You should also consider percpu_rw_semaphore
|
|
|
|
|
when you need lightweight readers.
|
|
|
|
|
|
|
|
|
|
Note that, rcu_assign_pointer() relates to SRCU just as it does
|
|
|
|
|
to other forms of RCU.
|
|
|
|
|
SRCU's expedited primitive (synchronize_srcu_expedited())
|
|
|
|
|
never sends IPIs to other CPUs, so it is easier on
|
|
|
|
|
real-time workloads than is synchronize_rcu_expedited(),
|
|
|
|
|
synchronize_rcu_bh_expedited() or synchronize_sched_expedited().
|
|
|
|
|
|
|
|
|
|
Note that rcu_dereference() and rcu_assign_pointer() relate to
|
|
|
|
|
SRCU just as they do to other forms of RCU.
|
|
|
|
|
|
|
|
|
|
15. The whole point of call_rcu(), synchronize_rcu(), and friends
|
|
|
|
|
is to wait until all pre-existing readers have finished before
|
|
|
|
|
@@ -435,3 +454,33 @@ over a rather long period of time, but improvements are always welcome!
|
|
|
|
|
|
|
|
|
|
These debugging aids can help you find problems that are
|
|
|
|
|
otherwise extremely difficult to spot.
|
|
|
|
|
|
|
|
|
|
18. If you register a callback using call_rcu(), call_rcu_bh(),
|
|
|
|
|
call_rcu_sched(), or call_srcu(), and pass in a function defined
|
|
|
|
|
within a loadable module, then it in necessary to wait for
|
|
|
|
|
all pending callbacks to be invoked after the last invocation
|
|
|
|
|
and before unloading that module. Note that it is absolutely
|
|
|
|
|
-not- sufficient to wait for a grace period! The current (say)
|
|
|
|
|
synchronize_rcu() implementation waits only for all previous
|
|
|
|
|
callbacks registered on the CPU that synchronize_rcu() is running
|
|
|
|
|
on, but it is -not- guaranteed to wait for callbacks registered
|
|
|
|
|
on other CPUs.
|
|
|
|
|
|
|
|
|
|
You instead need to use one of the barrier functions:
|
|
|
|
|
|
|
|
|
|
o call_rcu() -> rcu_barrier()
|
|
|
|
|
o call_rcu_bh() -> rcu_barrier_bh()
|
|
|
|
|
o call_rcu_sched() -> rcu_barrier_sched()
|
|
|
|
|
o call_srcu() -> srcu_barrier()
|
|
|
|
|
|
|
|
|
|
However, these barrier functions are absolutely -not- guaranteed
|
|
|
|
|
to wait for a grace period. In fact, if there are no call_rcu()
|
|
|
|
|
callbacks waiting anywhere in the system, rcu_barrier() is within
|
|
|
|
|
its rights to return immediately.
|
|
|
|
|
|
|
|
|
|
So if you need to wait for both an RCU grace period and for
|
|
|
|
|
all pre-existing call_rcu() callbacks, you will need to execute
|
|
|
|
|
both rcu_barrier() and synchronize_rcu(), if necessary, using
|
|
|
|
|
something like workqueues to to execute them concurrently.
|
|
|
|
|
|
|
|
|
|
See rcubarrier.txt for more information.
|
|
|
|
|
|