linux

Author	SHA1	Message	Date
Dave Chinner	0e1fdafd93	superblock: add filesystem shrinker operations Now we have a per-superblock shrinker implementation, we can add a filesystem specific callout to it to allow filesystem internal caches to be shrunk by the superblock shrinker. Rather than perpetuate the multipurpose shrinker callback API (i.e. nr_to_scan == 0 meaning "tell me how many objects freeable in the cache), two operations will be added. The first will return the number of objects that are freeable, the second is the actual shrinker call. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 20:47:41 -04:00
J. Bruce Fields	8fb47a4fbf	locks: rename lock-manager ops Both the filesystem and the lock manager can associate operations with a lock. Confusingly, one of them (fl_release_private) actually has the same name in both operation structures. It would save some confusion to give the lock-manager ops different names. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-07-20 20:23:19 -04:00
Linus Torvalds	919d25a710	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86. reboot: Make Dell Latitude E6320 use reboot=pci x86, doc only: Correct real-mode kernel header offset for init_size x86: Disable AMD_NUMA for 32bit for now	2011-07-20 15:33:59 -07:00
Al Viro	76fe3276be	->permission() sanitizing: document API changes Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 01:43:36 -04:00
Al Viro	10556cb21a	->permission() sanitizing: don't pass flags to ->permission() not used by the instances anymore. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 01:43:24 -04:00
Al Viro	7e40145eb1	->permission() sanitizing: don't pass flags to ->check_acl() not used in the instances anymore. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 01:43:21 -04:00
Dave Martin	540b573875	ARM: 6999/1: head, zImage: Always Enter the kernel in ARM state Currently, the documented kernel entry requirements are not explicit about whether the kernel should be entered in ARM or Thumb, leading to an ambiguitity about how to enter Thumb-2 kernels. As a result, the kernel is reliant on the zImage decompressor to enter the kernel proper in the correct instruction set state. This patch changes the boot entry protocol for head.S and Image to be the same as for zImage: in all cases, the kernel is now entered in ARM. Documentation/arm/Booting is updated to reflect this new policy. A different rule will be needed for Cortex-M class CPUs as and when support for those lands in mainline, since these CPUs don't support the ARM instruction set at all: a note is added to the effect that the kernel must be entered in Thumb on such systems. Signed-off-by: Dave Martin <dave.martin@linaro.org> Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2011-07-19 12:00:53 +01:00
Takashi Iwai	737c265bb8	ALSA: hda - Add documentation for codec-specific mixer controls Signed-off-by: Takashi Iwai <tiwai@suse.de>	2011-07-19 09:34:10 +02:00
J. Bruce Fields	c46556c6be	nfsd4: update nfsv4.1 implementation notes Update documentation to reflect recent progress. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-07-18 18:41:33 -04:00
Keiichi Kii	21d541aa19	updated Documentation/ja_JP/SubmittingPatches Updated Documentaion/ja_JP/SubmittingPatches due to changes from upstream version(2.6.39) of the file. And also, this translated SubmittingPatches is already reviewed by JF project. The JF project is the Japanese equivalent of LDP. Signed-off-by: Keiichi Kii <k-keiichi@bx.jp.nec.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2011-07-18 13:37:12 -07:00
Akinobu Mita	d0a542637f	debugfs: add documentation for debugfs_create_x64 debugfs_create_x64() exists. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2011-07-18 13:36:15 -07:00
Mimi Zohar	7102ebcd65	evm: permit only valid security.evm xattrs to be updated In addition to requiring CAP_SYS_ADMIN permission to modify/delete security.evm, prohibit invalid security.evm xattrs from changing, unless in fixmode. This patch prevents inadvertent 'fixing' of security.evm to reflect offline modifications. Changelog v7: - rename boot paramater 'evm_mode' to 'evm' Reported-by: Roberto Sassu <roberto.sassu@polito.it> Signed-off-by: Mimi Zohar <zohar@us.ibm.com>	2011-07-18 12:29:49 -04:00
Mimi Zohar	66dbc325af	evm: re-release EVM protects a file's security extended attributes(xattrs) against integrity attacks. This patchset provides the framework and an initial method. The initial method maintains an HMAC-sha1 value across the security extended attributes, storing the HMAC value as the extended attribute 'security.evm'. Other methods of validating the integrity of a file's metadata will be posted separately (eg. EVM-digital-signatures). While this patchset does authenticate the security xattrs, and cryptographically binds them to the inode, coming extensions will bind other directory and inode metadata for more complete protection. To help simplify the review and upstreaming process, each extension will be posted separately (eg. IMA-appraisal, IMA-appraisal-directory). For a general overview of the proposed Linux integrity subsystem, refer to Dave Safford's whitepaper: http://downloads.sf.net/project/linux-ima/linux-ima/Integrity_overview.pdf. EVM depends on the Kernel Key Retention System to provide it with a trusted/encrypted key for the HMAC-sha1 operation. The key is loaded onto the root's keyring using keyctl. Until EVM receives notification that the key has been successfully loaded onto the keyring (echo 1 > <securityfs>/evm), EVM can not create or validate the 'security.evm' xattr, but returns INTEGRITY_UNKNOWN. Loading the key and signaling EVM should be done as early as possible. Normally this is done in the initramfs, which has already been measured as part of the trusted boot. For more information on creating and loading existing trusted/encrypted keys, refer to Documentation/keys-trusted-encrypted.txt. A sample dracut patch, which loads the trusted/encrypted key and enables EVM, is available from http://linux-ima.sourceforge.net/#EVM. Based on the LSMs enabled, the set of EVM protected security xattrs is defined at compile. EVM adds the following three calls to the existing security hooks: evm_inode_setxattr(), evm_inode_post_setxattr(), and evm_inode_removexattr. To initialize and update the 'security.evm' extended attribute, EVM defines three calls: evm_inode_post_init(), evm_inode_post_setattr() and evm_inode_post_removexattr() hooks. To verify the integrity of a security xattr, EVM exports evm_verifyxattr(). Changelog v7: - Fixed URL in EVM ABI documentation Changelog v6: (based on Serge Hallyn's review) - fix URL in patch description - remove evm_hmac_size definition - use SHA1_DIGEST_SIZE (removed both MAX_DIGEST_SIZE and evm_hmac_size) - moved linux include before other includes - test for crypto_hash_setkey failure - fail earlier for invalid key - clear entire encrypted key, even on failure - check xattr name length before comparing xattr names Changelog: - locking based on i_mutex, remove evm_mutex - using trusted/encrypted keys for storing the EVM key used in the HMAC-sha1 operation. - replaced crypto hash with shash (Dmitry Kasatkin) - support for additional methods of verifying the security xattrs (Dmitry Kasatkin) - iint not allocated for all regular files, but only for those appraised - Use cap_sys_admin in lieu of cap_mac_admin - Use __vfs_setxattr_noperm(), without permission checks, from EVM Signed-off-by: Mimi Zohar <zohar@us.ibm.com> Acked-by: Serge Hallyn <serge.hallyn@canonical.com>	2011-07-18 12:29:40 -04:00
Nicolas Pitre	632b7cf6c0	ARM: mach-s3c2400: delete On Tue, 28 Jun 2011, Ben Dooks wrote: > On Tue, Jun 28, 2011 at 11:22:57PM +0200, Arnd Bergmann wrote: > > > On a related note, what about mach-s3c2400? It seems to be even more > > incomplete. > > Probably the same fate awaits that. It is so old that there's little > incentive to do anything with it. So out it goes as well. The PORT_S3C2400 definition in include/linux/serial_core.h is left there to prevent a reuse of the same number for another port type. Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org> Acked-by: Arnd Bergmann <arnd@arndb.de>	2011-07-18 10:59:26 -04:00
Arnd Bergmann	3a6cb8ce07	Merge branch 'zynq/master' of git+ssh://master.kernel.org/pub/scm/linux/kernel/git/arm/linux-arm-soc into next/soc Conflicts: arch/arm/Kconfig arch/arm/mm/Kconfig	2011-07-17 21:43:26 +02:00
Takao Indoh	4996c02306	ACPI: introduce "acpi_rsdp=" parameter for kdump There is a problem with putting the first kernel in EFI virtual mode, it is that when the second kernel comes up it tries to initialize the EFI again and once we have put EFI in virtual mode we can not really do that. Actually, EFI is not necessary for kdump, we can boot the second kernel with "noefi" parameter, but the boot will mostly fail because 2nd kernel cannot find RSDP. In this situation, we introduced "acpi_rsdp=" kernel parameter, so that kexec-tools can pass the "noefi acpi_rsdp=X" to the second kernel to make kdump works. The physical address of the RSDP can be got from sysfs(/sys/firmware/efi/systab). Signed-off-by: Takao Indoh <indou.takao@jp.fujitsu.com> Reviewed-by: WANG Cong <amwang@redhat.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Len Brown <len.brown@intel.com>	2011-07-16 18:40:16 -04:00
Stefan Richter	9a00c24ae7	firewire: document the sysfs ABIs of firewire-core and firewire-sbp2. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>	2011-07-16 07:24:33 +02:00
Stefan Richter	f6a7cd0212	firewire: cdev: ABI documentation enhancements Add overview documentation in Documentation/ABI/stable/firewire-cdev. Improve the inline reference documentation in firewire-cdev.h: - Add /* available since kernel... */ comments to event numbers consistent with the comments on ioctl numbers. - Shorten some documentation on an event and an ioctl that are less interesting to current programming because there are newer preferable variants. - Spell Configuration ROM (name of an IEEE 1212 register) in upper case. - Move the dummy FW_CDEV_VERSION out of the reader's field of vision. We should remove it from the header next year or so. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>	2011-07-16 07:24:32 +02:00
Grant Likely	8c11642a50	Merge commit 'v3.0-rc7' into devicetree/next	2011-07-15 20:11:34 -06:00
NeilBrown	49b28684fd	nfsd: Remove deprecated nfsctl system call and related code. As promised in feature-removal-schedule.txt it is time to remove the nfsctl system call. Userspace has perferred to not use this call throughout 2.6 and it has been excluded in the default configuration since 2.6.36 (9 months ago). So this patch removes all the code that was being compiled out. There are still references to sys_nfsctl in various arch systemcall tables and related code. These should be cleaned out too, probably in the next merge window. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-07-15 18:58:42 -04:00
Rafael J. Wysocki	7ae033cc0d	Merge branch 'pm-runtime' into for-linus * pm-runtime: OMAP: PM: disable idle on suspend for GPIO and UART OMAP: PM: omap_device: add API to disable idle on suspend OMAP: PM: omap_device: add system PM methods for PM domain handling OMAP: PM: omap_device: conditionally use PM domain runtime helpers PM / Runtime: Add new helper function: pm_runtime_status_suspended() PM / Runtime: Consistent utilization of deferred_resume PM / Runtime: Prevent runtime_resume from racing with probe PM / Runtime: Replace "run-time" with "runtime" in documentation PM / Runtime: Improve documentation of enable, disable and barrier PM: Limit race conditions between runtime PM and system sleep (v2) PCI / PM: Detect early wakeup in pci_pm_prepare() PM / Runtime: Return special error code if runtime PM is disabled PM / Runtime: Update documentation of interactions with system sleep	2011-07-15 23:59:25 +02:00
Rafael J. Wysocki	ba1389d74f	Merge branch 'pm-domains' into for-linus * pm-domains: (33 commits) ARM / shmobile: Return -EBUSY from A4LC power off if A3RV is active PM / Domains: Take .power_off() error code into account ARM / shmobile: Use genpd_queue_power_off_work() ARM / shmobile: Use pm_genpd_poweroff_unused() PM / Domains: Introduce function to power off all unused PM domains PM / Domains: Queue up power off work only if it is not pending PM / Domains: Improve handling of wakeup devices during system suspend PM / Domains: Do not restore all devices on power off error PM / Domains: Allow callbacks to execute all runtime PM helpers PM / Domains: Do not execute device callbacks under locks PM / Domains: Make failing pm_genpd_prepare() clean up properly PM / Domains: Set device state to "active" during system resume ARM: mach-shmobile: sh7372 A3RV requires A4LC PM / Domains: Export pm_genpd_poweron() in header ARM: mach-shmobile: sh7372 late pm domain off ARM: mach-shmobile: Runtime PM late init callback ARM: mach-shmobile: sh7372 D4 support ARM: mach-shmobile: sh7372 A4MP support ARM: mach-shmobile: sh7372: make sure that fsi is peripheral of spu2 ARM: mach-shmobile: sh7372 A3SG support ...	2011-07-15 23:59:09 +02:00
Nishanth Menon	99f381d354	PM / OPP: Introduce function to free cpufreq table cpufreq table allocated by opp_init_cpufreq_table is better freed by OPP layer itself. This allows future modifications to the table handling to be transparent to the users. Signed-off-by: Nishanth Menon <nm@ti.com> Acked-by: Kevin Hilman <khilman@ti.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>	2011-07-15 23:58:18 +02:00
Masami Hiramatsu	6142431810	tracing/kprobes: Support module init function probing To support probing module init functions, kprobe-tracer allows user to define a probe on non-existed function when it is given with a module name. This also enables user to set a probe on a function on a specific module, even if a same name (but different) function is locally defined in another module. The module name must be in the front of function name and separated by a ':'. e.g. btrfs:btrfs_init_sysfs Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Link: http://lkml.kernel.org/r/20110627072656.6528.89970.stgit@fedora15 Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2011-07-15 15:17:14 -04:00
Linus Torvalds	af8a927c8b	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2: nilfs2: remove resize from unsupported features list	2011-07-15 11:03:49 -07:00
Olof Johansson	eb5064db40	gpio/tegra: dt: add binding for gpio polarity Allocate one bit in the available extra cell to indicate if the gpio should be considered logically inverted. Signed-off-by: Olof Johansson <olof@lixom.net> Acked-by: Stephen Warren <swarren@nvidia.com> Signed-off-by: Grant Likely <grant.likely@secretlab.ca>	2011-07-15 11:51:27 -06:00
John W. Linville	95a943c162	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 into for-davem Conflicts: net/bluetooth/l2cap_core.c	2011-07-15 10:05:24 -04:00
Russell King	4aa96ccf9e	Merge branch 'kprobes-thumb' of git://git.yxit.co.uk/linux into devel-stable	2011-07-15 10:06:42 +01:00
Andy Lutomirski	98eedc3a9d	Document the vDSO and add a reference parser It turns out that parsing the vDSO is nontrivial if you don't already have an ELF dynamic loader around. So document it in Documentation/ABI and add a reference CC0-licenced parser. This code is dedicated to Go issue 1933: http://code.google.com/p/go/issues/detail?id=1933 Signed-off-by: Andy Lutomirski <luto@mit.edu> Link: http://lkml.kernel.org/r/a315a9514cd71bcf29436cc31e35aada21a5ff21.1310563276.git.luto@mit.edu Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-07-14 17:57:09 -07:00
Shawn Guo	22a85e4cd5	spi/imx: add device tree probe support It adds device tree probe support for spi-imx driver. Signed-off-by: Shawn Guo <shawn.guo@linaro.org> Cc: Grant Likely <grant.likely@secretlab.ca> Signed-off-by: Grant Likely <grant.likely@secretlab.ca>	2011-07-14 13:47:18 -06:00
David S. Miller	6a7ebdf2fd	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: net/bluetooth/l2cap_core.c	2011-07-14 07:56:40 -07:00
Linus Torvalds	5d7d5d9332	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (21 commits) slip: fix wrong SLIP6 ifdef-endif placing natsemi: fix another dma-debug report sctp: ABORT if receive, reassmbly, or reodering queue is not empty while closing socket net: Fix default in docs for tcp_orphan_retries. hso: fix a use after free condition net/natsemi: Fix module parameter permissions XFRM: Fix memory leak in xfrm_state_update sctp: Enforce retransmission limit during shutdown mac80211: fix TKIP replay vulnerability mac80211: fix ie memory allocation for scheduled scans ssb: fix init regression of hostmode PCI core rtlwifi: rtl8192cu: Add new USB ID for Netgear WNA1000M ath9k: Fix tx throughput drops for AR9003 chips with AES encryption carl9170: add NEC WL300NU-AG usbid cfg80211: fix deadlock with rfkill/sched_scan by adding new mutex ath5k: fix incorrect use of drvdata in PCI suspend/resume code ath5k: fix incorrect use of drvdata in sysfs code Bluetooth: Fix memory leak under page timeouts Bluetooth: Fix regression with incoming L2CAP connections Bluetooth: Fix hidp disconnect deadlocks and lost wakeup ...	2011-07-13 13:51:32 -07:00
Ryusuke Konishi	3a36199114	nilfs2: remove resize from unsupported features list Resize feature was supported by the commit `4e33f9eab0` but it was not reflected to the list of unsupported features in nilfs2.txt file. This updates the list to fix discrepancy. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2011-07-13 16:08:59 +09:00
Michał Mirosław	e5b1de1f5e	net: Add documentation for netdev features handling v2: incorporated suggestions from Randy Dunlap Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-07-12 22:27:00 -07:00
Darren Hart	11e48feebe	x86, doc only: Correct real-mode kernel header offset for init_size The real-mode kernel header init_size field is located at 0x260 per the field listing in th e"REAL-MODE KERNEL HEADER" section. It is listed as 0x25c in the "DETAILS OF HEADER FIELDS" section, which overlaps with pref_address. Correct the details listing to 0x260. Signed-off-by: Darren Hart <dvhart@linux.intel.com> Link: http://lkml.kernel.org/r/541cf88e2dfe5b8186d8b96b136d892e769a68c1.1310441260.git.dvhart@linux.intel.com CC: H. Peter Anvin <hpa@zytor.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-07-12 20:39:11 -07:00
Glauber Costa	9ddabbe72e	KVM: KVM Steal time guest/host interface To implement steal time, we need the hypervisor to pass the guest information about how much time was spent running other processes outside the VM. This is per-vcpu, and using the kvmclock structure for that is an abuse we decided not to make. In this patchset, I am introducing a new msr, KVM_MSR_STEAL_TIME, that holds the memory area address containing information about steal time This patch contains the headers for it. I am keeping it separate to facilitate backports to people who wants to backport the kernel part but not the hypervisor, or the other way around. Signed-off-by: Glauber Costa <glommer@redhat.com> Acked-by: Rik van Riel <riel@redhat.com> Tested-by: Eric B Munson <emunson@mgebm.net> CC: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> CC: Peter Zijlstra <peterz@infradead.org> CC: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-07-12 13:17:03 +03:00
Paul Mackerras	aa04b4cc5b	KVM: PPC: Allocate RMAs (Real Mode Areas) at boot for use by guests This adds infrastructure which will be needed to allow book3s_hv KVM to run on older POWER processors, including PPC970, which don't support the Virtual Real Mode Area (VRMA) facility, but only the Real Mode Offset (RMO) facility. These processors require a physically contiguous, aligned area of memory for each guest. When the guest does an access in real mode (MMU off), the address is compared against a limit value, and if it is lower, the address is ORed with an offset value (from the Real Mode Offset Register (RMOR)) and the result becomes the real address for the access. The size of the RMA has to be one of a set of supported values, which usually includes 64MB, 128MB, 256MB and some larger powers of 2. Since we are unlikely to be able to allocate 64MB or more of physically contiguous memory after the kernel has been running for a while, we allocate a pool of RMAs at boot time using the bootmem allocator. The size and number of the RMAs can be set using the kvm_rma_size=xx and kvm_rma_count=xx kernel command line options. KVM exports a new capability, KVM_CAP_PPC_RMA, to signal the availability of the pool of preallocated RMAs. The capability value is 1 if the processor can use an RMA but doesn't require one (because it supports the VRMA facility), or 2 if the processor requires an RMA for each guest. This adds a new ioctl, KVM_ALLOCATE_RMA, which allocates an RMA from the pool and returns a file descriptor which can be used to map the RMA. It also returns the size of the RMA in the argument structure. Having an RMA means we will get multiple KMV_SET_USER_MEMORY_REGION ioctl calls from userspace. To cope with this, we now preallocate the kvm->arch.ram_pginfo array when the VM is created with a size sufficient for up to 64GB of guest memory. Subsequently we will get rid of this array and use memory associated with each memslot instead. This moves most of the code that translates the user addresses into host pfns (page frame numbers) out of kvmppc_prepare_vrma up one level to kvmppc_core_prepare_memory_region. Also, instead of having to look up the VMA for each page in order to check the page size, we now check that the pages we get are compound pages of 16MB. However, if we are adding memory that is mapped to an RMA, we don't bother with calling get_user_pages_fast and instead just offset from the base pfn for the RMA. Typically the RMA gets added after vcpus are created, which makes it inconvenient to have the LPCR (logical partition control register) value in the vcpu->arch struct, since the LPCR controls whether the processor uses RMA or VRMA for the guest. This moves the LPCR value into the kvm->arch struct and arranges for the MER (mediated external request) bit, which is the only bit that varies between vcpus, to be set in assembly code when going into the guest if there is a pending external interrupt request. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>	2011-07-12 13:16:57 +03:00
Paul Mackerras	371fefd6f2	KVM: PPC: Allow book3s_hv guests to use SMT processor modes This lifts the restriction that book3s_hv guests can only run one hardware thread per core, and allows them to use up to 4 threads per core on POWER7. The host still has to run single-threaded. This capability is advertised to qemu through a new KVM_CAP_PPC_SMT capability. The return value of the ioctl querying this capability is the number of vcpus per virtual CPU core (vcore), currently 4. To use this, the host kernel should be booted with all threads active, and then all the secondary threads should be offlined. This will put the secondary threads into nap mode. KVM will then wake them from nap mode and use them for running guest code (while they are still offline). To wake the secondary threads, we send them an IPI using a new xics_wake_cpu() function, implemented in arch/powerpc/sysdev/xics/icp-native.c. In other words, at this stage we assume that the platform has a XICS interrupt controller and we are using icp-native.c to drive it. Since the woken thread will need to acknowledge and clear the IPI, we also export the base physical address of the XICS registers using kvmppc_set_xics_phys() for use in the low-level KVM book3s code. When a vcpu is created, it is assigned to a virtual CPU core. The vcore number is obtained by dividing the vcpu number by the number of threads per core in the host. This number is exported to userspace via the KVM_CAP_PPC_SMT capability. If qemu wishes to run the guest in single-threaded mode, it should make all vcpu numbers be multiples of the number of threads per core. We distinguish three states of a vcpu: runnable (i.e., ready to execute the guest), blocked (that is, idle), and busy in host. We currently implement a policy that the vcore can run only when all its threads are runnable or blocked. This way, if a vcpu needs to execute elsewhere in the kernel or in qemu, it can do so without being starved of CPU by the other vcpus. When a vcore starts to run, it executes in the context of one of the vcpu threads. The other vcpu threads all go to sleep and stay asleep until something happens requiring the vcpu thread to return to qemu, or to wake up to run the vcore (this can happen when another vcpu thread goes from busy in host state to blocked). It can happen that a vcpu goes from blocked to runnable state (e.g. because of an interrupt), and the vcore it belongs to is already running. In that case it can start to run immediately as long as the none of the vcpus in the vcore have started to exit the guest. We send the next free thread in the vcore an IPI to get it to start to execute the guest. It synchronizes with the other threads via the vcore->entry_exit_count field to make sure that it doesn't go into the guest if the other vcpus are exiting by the time that it is ready to actually enter the guest. Note that there is no fixed relationship between the hardware thread number and the vcpu number. Hardware threads are assigned to vcpus as they become runnable, so we will always use the lower-numbered hardware threads in preference to higher-numbered threads if not all the vcpus in the vcore are runnable, regardless of which vcpus are runnable. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>	2011-07-12 13:16:57 +03:00
David Gibson	54738c0971	KVM: PPC: Accelerate H_PUT_TCE by implementing it in real mode This improves I/O performance for guests using the PAPR paravirtualization interface by making the H_PUT_TCE hcall faster, by implementing it in real mode. H_PUT_TCE is used for updating virtual IOMMU tables, and is used both for virtual I/O and for real I/O in the PAPR interface. Since this moves the IOMMU tables into the kernel, we define a new KVM_CREATE_SPAPR_TCE ioctl to allow qemu to create the tables. The ioctl returns a file descriptor which can be used to mmap the newly created table. The qemu driver models use them in the same way as userspace managed tables, but they can be updated directly by the guest with a real-mode H_PUT_TCE implementation, reducing the number of host/guest context switches during guest IO. There are certain circumstances where it is useful for userland qemu to write to the TCE table even if the kernel H_PUT_TCE path is used most of the time. Specifically, allowing this will avoid awkwardness when we need to reset the table. More importantly, we will in the future need to write the table in order to restore its state after a checkpoint resume or migration. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>	2011-07-12 13:16:56 +03:00
Paul Mackerras	de56a948b9	KVM: PPC: Add support for Book3S processors in hypervisor mode This adds support for KVM running on 64-bit Book 3S processors, specifically POWER7, in hypervisor mode. Using hypervisor mode means that the guest can use the processor's supervisor mode. That means that the guest can execute privileged instructions and access privileged registers itself without trapping to the host. This gives excellent performance, but does mean that KVM cannot emulate a processor architecture other than the one that the hardware implements. This code assumes that the guest is running paravirtualized using the PAPR (Power Architecture Platform Requirements) interface, which is the interface that IBM's PowerVM hypervisor uses. That means that existing Linux distributions that run on IBM pSeries machines will also run under KVM without modification. In order to communicate the PAPR hypercalls to qemu, this adds a new KVM_EXIT_PAPR_HCALL exit code to include/linux/kvm.h. Currently the choice between book3s_hv support and book3s_pr support (i.e. the existing code, which runs the guest in user mode) has to be made at kernel configuration time, so a given kernel binary can only do one or the other. This new book3s_hv code doesn't support MMIO emulation at present. Since we are running paravirtualized guests, this isn't a serious restriction. With the guest running in supervisor mode, most exceptions go straight to the guest. We will never get data or instruction storage or segment interrupts, alignment interrupts, decrementer interrupts, program interrupts, single-step interrupts, etc., coming to the hypervisor from the guest. Therefore this introduces a new KVMTEST_NONHV macro for the exception entry path so that we don't have to do the KVM test on entry to those exception handlers. We do however get hypervisor decrementer, hypervisor data storage, hypervisor instruction storage, and hypervisor emulation assist interrupts, so we have to handle those. In hypervisor mode, real-mode accesses can access all of RAM, not just a limited amount. Therefore we put all the guest state in the vcpu.arch and use the shadow_vcpu in the PACA only for temporary scratch space. We allocate the vcpu with kzalloc rather than vzalloc, and we don't use anything in the kvmppc_vcpu_book3s struct, so we don't allocate it. We don't have a shared page with the guest, but we still need a kvm_vcpu_arch_shared struct to store the values of various registers, so we include one in the vcpu_arch struct. The POWER7 processor has a restriction that all threads in a core have to be in the same partition. MMU-on kernel code counts as a partition (partition 0), so we have to do a partition switch on every entry to and exit from the guest. At present we require the host and guest to run in single-thread mode because of this hardware restriction. This code allocates a hashed page table for the guest and initializes it with HPTEs for the guest's Virtual Real Memory Area (VRMA). We require that the guest memory is allocated using 16MB huge pages, in order to simplify the low-level memory management. This also means that we can get away without tracking paging activity in the host for now, since huge pages can't be paged or swapped. This also adds a few new exports needed by the book3s_hv code. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>	2011-07-12 13:16:54 +03:00
Scott Wood	a4cd8b23ac	KVM: PPC: e500: enable magic page This is a shared page used for paravirtualization. It is always present in the guest kernel's effective address space at the address indicated by the hypercall that enables it. The physical address specified by the hypercall is not used, as e500 does not have real mode. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>	2011-07-12 13:16:37 +03:00
Avi Kivity	411c588dfb	KVM: MMU: Adjust shadow paging to work when SMEP=1 and CR0.WP=0 When CR0.WP=0, we sometimes map user pages as kernel pages (to allow the kernel to write to them). Unfortunately this also allows the kernel to fetch from these pages, even if CR4.SMEP is set. Adjust for this by also setting NX on the spte in these circumstances. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-07-12 13:16:26 +03:00
Jan Kiszka	58f0964ee4	KVM: Fix KVM_ASSIGN_SET_MSIX_ENTRY documentation The documented behavior did not match the implemented one (which also never changed). Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-07-12 13:16:19 +03:00
Jan Kiszka	91e3d71db2	KVM: Clarify KVM_ASSIGN_PCI_DEVICE documentation Neither host_irq nor the guest_msi struct are used anymore today. Tag the former, drop the latter to avoid confusion. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-07-12 13:16:16 +03:00
Jan Kiszka	7f4382e8fd	KVM: Fixup documentation section numbering Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 13:16:10 +03:00
Sasha Levin	55399a02e9	KVM: Document KVM_IOEVENTFD Document KVM_IOEVENTFD that can be used to receive notifications of PIO/MMIO events without triggering an exit. Signed-off-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 13:15:56 +03:00
Nadav Har'El	823e396558	KVM: nVMX: Documentation This patch includes a brief introduction to the nested vmx feature in the Documentation/kvm directory. The document also includes a copy of the vmcs12 structure, as requested by Avi Kivity. [marcelo: move to Documentation/virtual/kvm] Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 13:15:22 +03:00
Kevin Hilman	f3393b62f1	PM / Runtime: Add new helper function: pm_runtime_status_suspended() This boolean function simply returns whether or not the runtime status of the device is 'suspended'. Unlike pm_runtime_suspended(), this function returns the runtime status whether or not runtime PM for the device has been disabled or not. Also add entry to Documentation/power/runtime.txt Signed-off-by: Kevin Hilman <khilman@ti.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>	2011-07-12 11:17:09 +02:00
Avi Kivity	e76779339b	KVM: Document KVM_GET_LAPIC, KVM_SET_LAPIC ioctl Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 11:44:55 +03:00
Linus Torvalds	5adaf851d2	Documentation/Changes: remove some really obsolete text That file harkens back to the days of the big 2.4 -> 2.6 version jump, and was based even then on older versions. Some of it is just obsolete, and Jesper Juhl points out that it talks about kernel versions 2.6 and should be updated to 3.0. Remove some obsolete text, and re-phrase some other to not be 2.6-specific. Reported-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-07-11 16:48:38 -07:00

... 39 40 41 42 43 ...

11053 Commits