Add a ctl_table_root pointer to ctl_table set so it is easy to
go from a ctl_table_set to a ctl_table_root.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Replace sysctl_head_next with first_entry and next_entry. These new
iterators operate at the level of sysctl table entries and filter
out any sysctl tables that should not be shown.
Utilizing two specialized functions instead of a single function removes
conditionals for handling awkward special cases that only come up
at the beginning of iteration, making the iterators easier to read
and understand.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Replace the helpers that proc_sys_lookup uses with helpers that work
in terms of an entire sysctl directory. This is worse for sysctl_lock
hold times but it is much better for code clarity and the code cleanups
to come.
find_in_table is no longer needed so it is removed.
find_entry a general helper to find entries in a directory is added.
lookup_entry is a simple wrapper around find_entry that takes the
sysctl_lock increases the use count if an entry is found and drops
the sysctl_lock.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Every other directory has a .child member and we look at the .child
for our entries. Do the same for the root_table.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Add nreg to ctl_table_header. When nreg drops to 0 the ctl_table_header
will be unregistered.
Factor out drop_sysctl_table from unregister_sysctl_table, and add
the logic for decrementing nreg.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Instead of relying on sysct_head_next(NULL) to magically
return the right header for the root directory instead
explicitly transform NULL into the root directories header.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
While useful at one time for selinux and the sysctl sanity
checks those users no longer use the parent field and we can
safely remove it.
Inspired-by: Lucian Adrian Grijincu <lucian.grijincu@gmil.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
- Stop validating subdirectories now that we only register leaf tables
- Cleanup and improve the duplicate filename check.
* Run the duplicate filename check under the sysctl_lock to guarantee
we never add duplicate names.
* Reduce the duplicate filename check to nearly O(M*N) where M is the
number of entries in tthe table we are registering and N is the
number of entries in the directory before we got there.
- Move the duplicate filename check into it's own function and call
it directtly from __register_sysctl_table
- Kill the config option as the sanity checks are now cheap enough
the config option is unnecessary. The original reason for the config
option was because we had a huge table used to verify the proc filename
to binary sysctl mapping. That table has now evolved into the binary_sysctl
translation layer and is no longer part of the sysctl_check code.
- Tighten up the permission checks. Guarnateeing that files only have read
or write permissions.
- Removed redudant check for parents having a procname as now everything has
a procname.
- Generalize the backtrace logic so that we print a backtrace from
any failure of __register_sysctl_table that was not caused by
a memmory allocation failure. The backtrace allows us to track
down who erroneously registered a sysctl table.
Bechmark before (CONFIG_SYSCTL_CHECK=y):
make-dummies 0 999 -> 12s
rmmod dummy -> 0.08s
Bechmark before (CONFIG_SYSCTL_CHECK=n):
make-dummies 0 999 -> 0.7s
rmmod dummy -> 0.06s
make-dummies 0 99999 -> 1m13s
rmmod dummy -> 0.38s
Benchmark after:
make-dummies 0 999 -> 0.65s
rmmod dummy -> 0.055s
make-dummies 0 9999 -> 1m10s
rmmod dummy -> 0.39s
The sysctl sanity checks now impose no measurable cost.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Split the registration of a complex ctl_table array which may have
arbitrary numbers of directories (->child != NULL) and tables of files
into a series of simpler registrations that only register tables of files.
Graphically:
register('dir', { + file-a
+ file-b
+ subdir1
+ file-c
+ subdir2
+ file-d
+ file-e })
is transformed into:
wrapper->subheaders[0] = register('dir', {file1-a, file1-b})
wrapper->subheaders[1] = register('dir/subdir1', {file-c})
wrapper->subheaders[2] = register('dir/subdir2', {file-d, file-e})
return wrapper
This guarantees that __register_sysctl_table will only see a simple
ctl_table array with all entries having (->child == NULL).
Care was taken to pass the original simple ctl_table arrays to
__register_sysctl_table whenever possible.
This change is derived from a similar patch written
by Lucrian Grijincu.
Inspired-by: Lucian Adrian Grijincu <lucian.grijincu@gmail.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
For any component of table passed to __register_sysctl_paths
that actually serves as a path, add that to the cstring path
that is passed to __register_sysctl_table.
The result is that for most calls to __register_sysctl_paths
we only pass a table to __register_sysctl_table that contains
no child directories.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Make __register_sysctl_table the core sysctl registration operation and
make it take a char * string as path.
Now that binary paths have been banished into the real of backwards
compatibility in kernel/binary_sysctl.c where they can be safely
ignored there is no longer a need to use struct ctl_path to represent
path names when registering ctl_tables.
Start the transition to using normal char * strings to represent
pathnames when registering sysctl tables. Normal strings are easier
to deal with both in the internal sysctl implementation and for
programmers registering sysctl tables.
__register_sysctl_paths is turned into a backwards compatibility wrapper
that converts a ctl_path array into a normal char * string.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Creating local copies of directory names is a good idea for
two reasons.
- The dynamic names used by callers must be copied into new
strings by the callers today to ensure the strings do not
change between register and unregister of the sysctl table.
- Sysctl directories have a potentially different lifetime
than the time between register and unregister of any
particular sysctl table.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
In sysctl_net register the two networking roots in the proper order.
In register_sysctl walk the sysctl sets in the reverse order of the
sysctl roots.
Remove parent from ctl_table_set and setup_sysctl_set as it is no
longer needed.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
This adds a small helper retire_sysctl_set to remove the intimate knowledge about
the how a sysctl_set is implemented from net/sysct_net.c
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
I goofed when I made sysctl directories have nlink == 0.
nlink == 0 means the directory has been deleted.
nlink == 1 meands a directory does not count subdirectories.
Use the default nlink == 1 for sysctl directories.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Move the core sysctl code from kernel/sysctl.c and kernel/sysctl_check.c
into fs/proc/proc_sysctl.c.
Currently sysctl maintenance is hampered by the sysctl implementation
being split across 3 files with artificial layering between them.
Consolidate the entire sysctl implementation into 1 file so that
it is easier to see what is going on and hopefully allowing for
simpler maintenance.
For functions that are now only used in fs/proc/proc_sysctl.c remove
their declarations from sysctl.h and make them static in fs/proc/proc_sysctl.c
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Simplify the code by treating the base sysctl table like any other
sysctl table and register it with register_sysctl_table.
To ensure this table is registered early enough to avoid problems
call sysctl_init from proc_sys_init.
Rename sysctl_net.c:sysctl_init() to net_sysctl_init() to avoid
name conflicts now that kernel/sysctl.c:sysctl_init() is no longer
static.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Remove checks for conditions that will never happen. If procname is NULL
the loop would already had bailed out, so there's no need to check it
again.
At the same time this also compacts the function find_in_table() by
refactoring it to be easier to read.
Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
Reviewed-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
There is a misleading difference between /proc and /sys permissions, /proc is 0555 and /sys is 0755. But
as it is impossible to create or unlink something in /sys it would be nice to have same permissions.
Signed-off-by: Vitaly Kuznetsov <vitty@altlinux.ru>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
After adding devpts multiple-insrances sysctl kernel.pty.max limit pty count for
each devpts instance independently, while kernel.pty.nr shows total pty count.
This patch restores sysctl kernel.pty.max as global limit (4096 by default),
adds pty reseve for main devpts (mounted without "newinstance" argument),
and new sysctl to tune it: kernel.pty.reserve (1024 by default)
Also it adds devpts mount option "max=%d" to limit pty count for each devpts
instance independently. (by default NR_UNIX98_PTY_MAX == 2^20)
Thus devpts instances in containers cannot eat up all available pty even if we didn't
set any limits, while with "max" argument we can adjust limits more precisely.
Plus, now open("/dev/ptmx") return -ENOSPC in case lack of pty indexes,
this is more informative than -EIO.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Let's move this stuff to the better place, where we can account pty right in
tty-indexes managing code.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Tracking the number of subdirectories requires an extra field that increases
the size of sysfs_dirent. nlinks are not particularly interesting for sysfs
and the nlink counts are wrong when network namespaces are involved so stop
counting them, and always return nlink == 1. Userspace already knows that
directories with nlink == 1 have an nlink count they can't use to count
subdirectories.
This reduces the size of sysfs_dirent by 8 bytes on 64bit platforms.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Store the sysfs inode number in an unsided int because
ida inode allocator can return at most a 31 bit number,
reducing the size of struct sysfs_dirent by 8 bytes
on 64bit platforms.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
On 32bit this reduces sizeof(struct sysfs_dirent) by 2 bytes.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Compute a 31 bit hash of directory entries (that can fit in a signed
32bit off_t) and index the sysfs directory entries by that hash,
replacing the per directory indexes by name and by inode. Because we
now only use a single rbtree this reduces the size of sysfs_dirent by 2
pointers. Because we have fewer cases to deal with the code is now
simpler.
For now I use the simple hash that the dcache uses as that is easy to
use and seems simple enough.
In addition to makeing the code simpler using a hash for the file
position in readdir brings sysfs in line with other filesystems that
have non-trivial directory structures.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
quota: Pass information that quota is stored in system file to userspace
ext2: protect inode changes in the SETVERSION and SETFLAGS ioctls
jbd: Issue cache flush after checkpointing
Recently an OOPS was observed from the usb serial io_ti driver when it tried to remove
sysfs directories. Upon investigation it turns out this driver was always buggy
and that a recent sysfs change had stopped guarding itself against removing attributes
from sysfs directories that had already been removed. :(
Historically we have been silent about attempting to files from nonexistent sysfs
directories and have politely returned error codes. That has resulted in people writing
broken code that ignores the error codes.
Issue a kernel WARNING and a stack backtrace to make it clear in no uncertain
terms that abusing sysfs is not ok, and the callers need to fix their code.
This change transforms the io_ti OOPS into a more comprehensible error message
and stack backtrace.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Reported-by: Wolfgang Frisch <wfpub@roembden.net>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Fix new kernel-doc warnings:
Warning(fs/debugfs/file.c:556): No description found for parameter 'nregs'
Warning(fs/debugfs/file.c:556): Excess function parameter 'mregs' description in 'debugfs_print_regs32'
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Per akpm suggestions alter the use of the term flush to be
invalidate. The next patch will do this across all MM.
This change is completely cosmetic.
[v9: akpm@linux-foundation.org: change "flush" to "invalidate", part 3]
Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Jan Beulich <JBeulich@novell.com>
Reviewed-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Rik Riel <riel@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
[v10: Fixed fs: move code out of buffer.c conflict change]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
The usual kernel-doc fixups from Randy. Some of them David acked as
merged in his tree, this is the random left-overs.
* kernel-doc:
docbook: fix sched source file names in device-drivers book
docbook: change iomap source filename in deviceiobook
docbook: don't use serial_core.h in device-drivers book
kernel-doc: fix kernel-doc warnings in sched
kernel-doc: fix new warnings in cfg80211.h
kernel-doc: fix new warning in usb.h
kernel-doc: fix new warnings in device.h
kernel-doc: fix new warnings in debugfs
kernel-doc: fix new warning in regulator core
kernel-doc: fix new warnings in pci
kernel-doc: fix new warnings in driver-core
kernel-doc: fix new warnings in auditsc.c
scripts/kernel-doc: fix fatal error caused by cfg80211.h
Quoth Andrew:
"Random fixes. And a simple new LED driver which I'm trying to sneak
in while you're not looking."
Sneaking successful.
* akpm:
score: fix off-by-one index into syscall table
mm: fix rss count leakage during migration
SHM_UNLOCK: fix Unevictable pages stranded after swap
SHM_UNLOCK: fix long unpreemptible section
kdump: define KEXEC_NOTE_BYTES arch specific for s390x
mm/hugetlb.c: undo change to page mapcount in fault handler
mm: memcg: update the correct soft limit tree during migration
proc: clear_refs: do not clear reserved pages
drivers/video/backlight/l4f00242t03.c: return proper error in l4f00242t03_probe if regulator_get() fails
drivers/video/backlight/adp88x0_bl.c: fix bit testing logic
kprobes: initialize before using a hlist
ipc/mqueue: simplify reading msgqueue limit
leds: add led driver for Bachmann's ot200
mm: __count_immobile_pages(): make sure the node is online
mm: fix NULL ptr dereference in __count_immobile_pages
mm: fix warnings regarding enum migrate_mode
* git://git.samba.org/sfrench/cifs-2.6:
CIFS: Rename *UCS* functions to *UTF16*
[CIFS] ACL and FSCACHE support no longer EXPERIMENTAL
[CIFS] Fix build break with multiuser patch when LANMAN disabled
cifs: warn about impending deprecation of legacy MultiuserMount code
cifs: fetch credentials out of keyring for non-krb5 auth multiuser mounts
cifs: sanitize username handling
keys: add a "logon" key type
cifs: lower default wsize when unix extensions are not used
cifs: better instrumentation for coalesce_t2
cifs: integer overflow in parse_dacl()
cifs: Fix sparse warning when calling cifs_strtoUCS
CIFS: Add descriptions to the brlock cache functions
Fix new kernel-doc warnings:
Warning(fs/debugfs/file.c:556): No description found for parameter 'nregs'
Warning(fs/debugfs/file.c:556): Excess function parameter 'mregs' description in 'debugfs_print_regs32'
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
/proc/pid/clear_refs is used to clear the Referenced and YOUNG bits for
pages and corresponding page table entries of the task with PID pid, which
includes any special mappings inserted into the page tables in order to
provide things like vDSOs and user helper functions.
On ARM this causes a problem because the vectors page is mapped as a
global mapping and since ec706dab ("ARM: add a vma entry for the user
accessible vector page"), a VMA is also inserted into each task for this
page to aid unwinding through signals and syscall restarts. Since the
vectors page is required for handling faults, clearing the YOUNG bit (and
subsequently writing a faulting pte) means that we lose the vectors page
*globally* and cannot fault it back in. This results in a system deadlock
on the next exception.
To see this problem in action, just run:
$ echo 1 > /proc/self/clear_refs
on an ARM platform (as any user) and watch your system hang. I think this
has been the case since 2.6.37
This patch avoids clearing the aforementioned bits for reserved pages,
therefore leaving the vectors page intact on ARM. Since reserved pages
are not candidates for swap, this change should not have any impact on the
usefulness of clear_refs.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Reported-by: Moussa Ba <moussaba@micron.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Acked-by: Nicolas Pitre <nico@linaro.org>
Cc: Matt Mackall <mpm@selenic.com>
Cc: <stable@vger.kernel.org> [2.6.37+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/accounting, proc: Fix /proc/stat interrupts sum
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tracepoints/module: Fix disabling tracepoints with taint CRAP or OOT
x86/kprobes: Add arch/x86/tools/insn_sanity to .gitignore
x86/kprobes: Fix typo transferred from Intel manual
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, syscall: Need __ARCH_WANT_SYS_IPC for 32 bits
x86, tsc: Fix SMI induced variation in quick_pit_calibrate()
x86, opcode: ANDN and Group 17 in x86-opcode-map.txt
x86/kconfig: Move the ZONE_DMA entry under a menu
x86/UV2: Add accounting for BAU strong nacks
x86/UV2: Ack BAU interrupt earlier
x86/UV2: Remove stale no-resources test for UV2 BAU
x86/UV2: Work around BAU bug
x86/UV2: Fix BAU destination timeout initialization
x86/UV2: Fix new UV2 hardware by using native UV2 broadcast mode
x86: Get rid of dubious one-bit signed bitfield
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
qnx4: don't leak ->BitMap on late failure exits
qnx4: reduce the insane nesting in qnx4_checkroot()
qnx4: di_fname is an array, for crying out loud...
vfs: remove printk from set_nlink()
wake up s_wait_unfrozen when ->freeze_fs fails
The kernel contains some special internal keyrings, for instance the DNS
resolver keyring :
2a93faf1 I----- 1 perm 1f030000 0 0 keyring .dns_resolver: empty
It would occasionally be useful to allow the contents of such keyrings to be
flushed by root (cache invalidation).
Allow a flag to be set on a keyring to mark that someone possessing the
sysadmin capability can clear the keyring, even without normal write access to
the keyring.
Set this flag on the special keyrings created by the DNS resolver, the NFS
identity mapper and the CIFS identity mapper.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Steve Dickson <steved@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
CIFS ACL support and FSCACHE support have been in long enough
to be no longer considered experimental. Remove obsolete Kconfig
dependency.
Signed-off-by: Steve French <sfrench@us.ibm.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
We'll allow a grace period of 2 releases (3.3 and 3.4) and then remove
the legacy code in 3.5.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Fix up multiuser mounts to set the secType and set the username and
password from the key payload in the vol info for non-krb5 auth types.
Look for a key of type "secret" with a description of
"cifs🅰️<server address>" or "cifs:d:<domainname>". If that's found,
then scrape the username and password out of the key payload and use
that to create a new user session.
Finally, don't have the code enforce krb5 auth on multiuser mounts,
but do require a kernel with keys support.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Currently, it's not very clear whether you're allowed to have a NULL
vol->username or ses->user_name. Some places check for it and some don't.
Make it clear that a NULL pointer is OK in these fields, and ensure that
all the callers check for that.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
We've had some reports of servers (namely, the Solaris in-kernel CIFS
server) that don't deal properly with writes that are "too large" even
though they set CAP_LARGE_WRITE_ANDX. Change the default to better
mirror what windows clients do.
Cc: stable@vger.kernel.org
Cc: Pavel Shilovsky <piastry@etersoft.ru>
Reported-by: Nick Davis <phireph0x@yahoo.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>