The toolchain prefix will most likely be site specific and is not
guaranteed to always be "mips-linux-gnu-", so simply don't specify one.
A quick "git grep" shows this to be consistent amongst other cross
compiled targets.
Similarly, the site specific initramfs source location should not be used,
since that won't exist for most people, and it prevents them from doing
coverage builds on the defconfigs, such as those done in linux-next and run
routinely by many others.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Jayachandran C <jayachandranc@netlogicmicro.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/3296/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Commit d065bd810b
(mm: retry page fault when blocking on disk transfer) and
commit 37b23e0525
(x86,mm: make pagefault killable)
The above commits introduced changes into the x86 pagefault handler
for making the page fault handler retryable as well as killable.
These changes reduce the mmap_sem hold time, which is crucial
during OOM killer invocation.
Port these changes to MIPS.
Without these changes, my MIPS board encounters many hang and livelock
scenarios.
After applying this patch, OOM feature performance improves according to
my testing.
Signed-off-by: Mohd. Faris <mohdfarisq2010@gmail.com>
Signed-off-by: Kautuk Consul <consul.kautuk@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/3217/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
_sdata is defined twice in vmlinux.lds.S. According to vmlinux.ld.h
in asm-generic, _sdata should be marked at the beginning RO_DATA_SECTION.
* _sdata = .;
* RO_DATA_SECTION(PAGE_SIZE)
* RW_DATA_SECTION(...)
* _edata = .;
Remove the one that is marked at RW_DATA_SECTION.
Signed-off-by: Tony Wu <tung7970@gmail.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/3215/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Since a clocksource change post 3.2-rc1, tasks on my DB1500 board
hang after random amounts of time (from a few minutes to a few hours),
regardless of load. Debugging showed that the compare-match register
value is a few seconds lower than the current counter value.
The minimum value of 8 was initialy determined by a trial-and-error
approach. Currently it is sufficient for all Alchemys (without PCI
apparently), independent of CPU clock; only the DB1500 and DB1550
boards experience these timer-related tasks hangs now.
This patch increases the minimum timeout by 1 (to 9 counter ticks)
which seems sufficient since the systems are still working perfectly
fine after over 24 hours.
Signed-off-by: Manuel Lauss <manuel.lauss@googlemail.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/3214/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Add a few missing macros for the inlined (!CONFIG_GPIOLIB) GPIO case.
Fixes a build failure in the mmc core due to missing gpio_request_one()
function:
mmc/core/cd-gpio.c: In function 'mmc_cd_gpio_request':
mmc/core/cd-gpio.c:43:2: error: implicit declaration of function 'gpio_request_one' [-Werror=implicit-function-declaration]
Signed-off-by: Manuel Lauss <manuel.lauss@googlemail.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/3268/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
[Pls also look at https://lkml.org/lkml/2012/2/10/228]
Using of PAT to change pages from WB to WC works quite nicely.
Changing it back to WB - not so much. The crux of the matter is
that the code that does this (__page_change_att_set_clr) has only
limited information so when it tries to the change it gets
the "raw" unfiltered information instead of the properly filtered one -
and the "raw" one tell it that PSE bit is on (while infact it
is not). As a result when the PTE is set to be WB from WC, we get
tons of:
:WARNING: at arch/x86/xen/mmu.c:475 xen_make_pte+0x67/0xa0()
:Hardware name: HP xw4400 Workstation
.. snip..
:Pid: 27, comm: kswapd0 Tainted: G W 3.2.2-1.fc16.x86_64 #1
:Call Trace:
: [<ffffffff8106dd1f>] warn_slowpath_common+0x7f/0xc0
: [<ffffffff8106dd7a>] warn_slowpath_null+0x1a/0x20
: [<ffffffff81005a17>] xen_make_pte+0x67/0xa0
: [<ffffffff810051bd>] __raw_callee_save_xen_make_pte+0x11/0x1e
: [<ffffffff81040e15>] ? __change_page_attr_set_clr+0x9d5/0xc00
: [<ffffffff8114c2e8>] ? __purge_vmap_area_lazy+0x158/0x1d0
: [<ffffffff8114cca5>] ? vm_unmap_aliases+0x175/0x190
: [<ffffffff81041168>] change_page_attr_set_clr+0x128/0x4c0
: [<ffffffff81041542>] set_pages_array_wb+0x42/0xa0
: [<ffffffff8100a9b2>] ? check_events+0x12/0x20
: [<ffffffffa0074d4c>] ttm_pages_put+0x1c/0x70 [ttm]
: [<ffffffffa0074e98>] ttm_page_pool_free+0xf8/0x180 [ttm]
: [<ffffffffa0074f78>] ttm_pool_mm_shrink+0x58/0x90 [ttm]
: [<ffffffff8112ba04>] shrink_slab+0x154/0x310
: [<ffffffff8112f17a>] balance_pgdat+0x4fa/0x6c0
: [<ffffffff8112f4b8>] kswapd+0x178/0x3d0
: [<ffffffff815df134>] ? __schedule+0x3d4/0x8c0
: [<ffffffff81090410>] ? remove_wait_queue+0x50/0x50
: [<ffffffff8112f340>] ? balance_pgdat+0x6c0/0x6c0
: [<ffffffff8108fb6c>] kthread+0x8c/0xa0
for every page. The proper fix for this is has been posted
and is https://lkml.org/lkml/2012/2/10/228
"x86/cpa: Use pte_attrs instead of pte_flags on CPA/set_p.._wb/wc operations."
along with a detailed description of the problem and solution.
But since that posting has gone nowhere I am proposing
this band-aid solution so that at least users don't get
the page corruption (the pages that are WC don't get changed to WB
and end up being recycled for filesystem or other things causing
mysterious crashes).
The negative impact of this patch is that users of WC flag
(which are InfiniBand, radeon, nouveau drivers) won't be able
to set that flag - so they are going to see performance degradation.
But stability is more important here.
Fixes RH BZ# 742032, 787403, and 745574
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
commit 7347b4082e "xen: Allow
unprivileged Xen domains to create iomap pages" added a redundant
line in the early bootup code to filter out the PTE. That
filtering is already done a bit earlier so this extra processing
is not required.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
If the irq happens in user mode, our kernel stack is empty
(apart from the pt_regs themselves, of course), so there's no
need or advantage to switch.
And it really doesn't save any stack space, quite the reverse:
it means that a nested interrupt cannot switch irq stacks. So
instead of saving kernel stack space, it actually causes the
potential for *more* stack usage.
Also simplify the preemption count copy when we do switch
stacks: just copy the whole preemption count, rather than just
the softirq parts of it. There is no advantage to the partial
copy: it is more effort to get a less correct result.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1202191139260.10000@i5.linux-foundation.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Currently, the NMI handler tests if it is nested by checking the
special variable saved on the stack (set during NMI handling)
and whether the saved stack is the NMI stack as well (to prevent
the race when the variable is set to zero).
But userspace may set their %rsp to any value as long as they do
not derefence it, and it may make it point to the NMI stack,
which will prevent NMIs from triggering while the userspace app
is running. (I tested this, and it is indeed the case)
Add another check to determine nested NMIs by looking at the
saved %cs (code segment register) and making sure that it is the
kernel code segment.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: <stable@kernel.org>
Link: http://lkml.kernel.org/r/1329687817.1561.27.camel@acer.local.home
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Lower the rating of the UV rtc clocksource to just below that of
the tsc, to improve performance.
Reading the tsc clocksource has lower latency than reading the
rtc, so favor it in situations where it is synchronized and
stable. When the tsc is unsynchronized, the rtc needs to be the
chosen clocksource.
Signed-off-by: Dimitri Sivanich <sivanich@sgi.com>
Cc: John Stultz <johnstul@us.ibm.com>
Cc: Jack Steiner <steiner@sgi.com>
Link: http://lkml.kernel.org/r/20120217141641.GA28063@sgi.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
resolved below within the FSI driver and allow the application of the
dmaeengine conversion that depends on this resolution.
Linux 3.3-rc4
Conflicts:
sound/soc/sh/fsi.c
Changed bfin_get_ether_addr() to return a state and to
set no random mac address if the board don't provide one.
Let the caller of bfin_get_ether_addr() set a random mac
address if the return value is not 0.
v2: don't set random mac in bfin_get_ether_addr()
Signed-off-by: Danny Kukawka <danny.kukawka@bisect.de>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
drivers/net/ethernet/broadcom/bnx2x/bnx2x_stats.c
Small minor conflict in bnx2x, wherein one commit changed how
statistics were stored in software, and another commit
fixed endianness bugs wrt. reading the values provided by
the chip in memory.
Signed-off-by: David S. Miller <davem@davemloft.net>
Wrap accesses to the fd_sets in struct fdtable (for recording open files and
close-on-exec flags) so that we can move away from using fd_sets since we
abuse the fd_set structs by not allocating the full-sized structure under
normal circumstances and by non-core code looking at the internals of the
fd_sets.
The first abuse means that use of FD_ZERO() on these fd_sets is not permitted,
since that cannot be told about their abnormal lengths.
This introduces six wrapper functions for setting, clearing and testing
close-on-exec flags and fd-is-open flags:
void __set_close_on_exec(int fd, struct fdtable *fdt);
void __clear_close_on_exec(int fd, struct fdtable *fdt);
bool close_on_exec(int fd, const struct fdtable *fdt);
void __set_open_fd(int fd, struct fdtable *fdt);
void __clear_open_fd(int fd, struct fdtable *fdt);
bool fd_is_open(int fd, const struct fdtable *fdt);
Note that I've prepended '__' to the names of the set/clear functions because
they require the caller to hold a lock to use them.
Note also that I haven't added wrappers for looking behind the scenes at the
the array. Possibly that should exist too.
Signed-off-by: David Howells <dhowells@redhat.com>
Link: http://lkml.kernel.org/r/20120216174942.23314.1364.stgit@warthog.procyon.org.uk
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
These are the bug fixes that have accumulated since 3.3-rc3 in arm-soc.
The majority of them are regression fixes for stuff that broke during
the merge 3.3 window.
The notable ones are:
* The at91 ata drivers both broke because of an earlier cleanup patch that
some other patches were based on. Jean-Christophe decided to remove
the legacy at91_ide driver and fix the new-style at91-pata driver while
keeping the cleanup patch. I almost rejected the patches for being too
late and too big but in the end decided to accept them because they
fix a regression.
* A patch fixing build breakage from the sysdev-to-device conversion
colliding with other changes touches a number of mach-s3c files.
* b0654037 "ARM: orion: Fix Orion5x GPIO regression from MPP cleanup"
is a mechanical change that unfortunately touches a lot of lines
that should up in the diffstat.
* tag 'fixes-3.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (28 commits)
ARM: at91: drop ide driver in favor of the pata one
pata/at91: use newly introduced SMC accessors
ARM: at91: add accessor to manage SMC
ARM: at91:rtc/rtc-at91sam9: ioremap register bank
ARM: at91: USB AT91 gadget registration for module
ep93xx: fix build of vision_ep93xx.c
ARM: OMAP2xxx: PM: fix OMAP2xxx-specific UART idle bug in v3.3
ARM: orion: Fix USB phy for orion5x.
ARM: orion: Fix Orion5x GPIO regression from MPP cleanup
ARM: EXYNOS: Add cpu-offset property in gic device tree node
ARM: EXYNOS: Bring exynos4-dt up to date
ARM: OMAP3: cm-t35: fix section mismatch warning
ARM: OMAP2: Fix the OMAP2 only build break seen with 2011+ ARM tool-chains
ARM: tegra: paz00: fix wrong UART port on mini-pcie plug
ARM: tegra: paz00: fix wrong SD1 power gpio
i2c: tegra: Add devexit_p() for remove
ARM: EXYNOS: Correct M-5MOLS sensor clock frequency on Universal C210 board
ARM: EXYNOS: Correct framebuffer window size on Nuri board
ARM: SAMSUNG: Fix missing api-change from subsys_interface change
ARM: EXYNOS: Fix "warning: initialization from incompatible pointer type"
...
Here are a few more fixes for powerpc. Some are regressions, the rest
is simple/obvious/nasty enough that I deemed it good to go now.
Here's also step one of deprecating legacy iSeries support: we are
removing it from the main defconfig.
Nobody seems to be using it anymore and the code is nasty to maintain,
(involves horrible hacks in various low level areas of the kernel) so we
plan to actually rip it out at some point. For now let's just avoid
building it by default. Stephen will proceed to do the actual removal
later (probably 3.4 or 3.5).
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
powerpc/perf: power_pmu_start restores incorrect values, breaking frequency events
powerpc/adb: Use set_current_state()
powerpc: Disable interrupts early in Program Check
powerpc: Remove legacy iSeries from ppc64_defconfig
powerpc/fsl/pci: Fix PCIe fixup regression
powerpc: Fix kernel log of oops/panic instruction dump
Move the assabet specific reset handling out of mcp-sa11x0.c, into its
board file. This leaves the mcp code free from all board specific
details.
Acked-by: Jochen Friedrich <jochen@scram.de>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Add ucb1x00 platform data to enable GPIO support on the UCB1300 device.
Acked-by: Jochen Friedrich <jochen@scram.de>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Patch taken from 5dd7bf59e0 (ARM: sa11x0: Implement autoloading of codec
and codec pdata for mcp bus.) by Jochen Friedrich <jochen@scram.de>.
This adds just the codec data part of the patch.
Acked-by: Jochen Friedrich <jochen@scram.de>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
After all the FPU state cleanups and finally finding the problem that
caused all our FPU save/restore problems, this re-introduces the
preloading of FPU state that was removed in commit b3b0870ef3 ("i387:
do not preload FPU state at task switch time").
However, instead of simply reverting the removal, this reimplements
preloading with several fixes, most notably
- properly abstracted as a true FPU state switch, rather than as
open-coded save and restore with various hacks.
In particular, implementing it as a proper FPU state switch allows us
to optimize the CR0.TS flag accesses: there is no reason to set the
TS bit only to then almost immediately clear it again. CR0 accesses
are quite slow and expensive, don't flip the bit back and forth for
no good reason.
- Make sure that the same model works for both x86-32 and x86-64, so
that there are no gratuitous differences between the two due to the
way they save and restore segment state differently due to
architectural differences that really don't matter to the FPU state.
- Avoid exposing the "preload" state to the context switch routines,
and in particular allow the concept of lazy state restore: if nothing
else has used the FPU in the meantime, and the process is still on
the same CPU, we can avoid restoring state from memory entirely, just
re-expose the state that is still in the FPU unit.
That optimized lazy restore isn't actually implemented here, but the
infrastructure is set up for it. Of course, older CPU's that use
'fnsave' to save the state cannot take advantage of this, since the
state saving also trashes the state.
In other words, there is now an actual _design_ to the FPU state saving,
rather than just random historical baggage. Hopefully it's easier to
follow as a result.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This moves the bit that indicates whether a thread has ownership of the
FPU from the TS_USEDFPU bit in thread_info->status to a word of its own
(called 'has_fpu') in task_struct->thread.has_fpu.
This fixes two independent bugs at the same time:
- changing 'thread_info->status' from the scheduler causes nasty
problems for the other users of that variable, since it is defined to
be thread-synchronous (that's what the "TS_" part of the naming was
supposed to indicate).
So perfectly valid code could (and did) do
ti->status |= TS_RESTORE_SIGMASK;
and the compiler was free to do that as separate load, or and store
instructions. Which can cause problems with preemption, since a task
switch could happen in between, and change the TS_USEDFPU bit. The
change to TS_USEDFPU would be overwritten by the final store.
In practice, this seldom happened, though, because the 'status' field
was seldom used more than once, so gcc would generally tend to
generate code that used a read-modify-write instruction and thus
happened to avoid this problem - RMW instructions are naturally low
fat and preemption-safe.
- On x86-32, the current_thread_info() pointer would, during interrupts
and softirqs, point to a *copy* of the real thread_info, because
x86-32 uses %esp to calculate the thread_info address, and thus the
separate irq (and softirq) stacks would cause these kinds of odd
thread_info copy aliases.
This is normally not a problem, since interrupts aren't supposed to
look at thread information anyway (what thread is running at
interrupt time really isn't very well-defined), but it confused the
heck out of irq_fpu_usable() and the code that tried to squirrel
away the FPU state.
(It also caused untold confusion for us poor kernel developers).
It also turns out that using 'task_struct' is actually much more natural
for most of the call sites that care about the FPU state, since they
tend to work with the task struct for other reasons anyway (ie
scheduling). And the FPU data that we are going to save/restore is
found there too.
Thanks to Arjan Van De Ven <arjan@linux.intel.com> for pointing us to
the %esp issue.
Cc: Arjan van de Ven <arjan@linux.intel.com>
Reported-and-tested-by: Raphael Prevost <raphael@buro.asia>
Acked-and-tested-by: Suresh Siddha <suresh.b.siddha@intel.com>
Tested-by: Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The dev field is remove from struct regulator_consumer_supply since commit
737f36 "regulator: Remove support for supplies specified by struct device".
This fixes below build error:
CC arch/arm/mach-u300/i2c.o
arch/arm/mach-u300/i2c.c:63: error: unknown field 'dev' specified in initializer
arch/arm/mach-u300/i2c.c:95: error: unknown field 'dev' specified in initializer
make[1]: *** [arch/arm/mach-u300/i2c.o] Error 1
make: *** [arch/arm/mach-u300] Error 2
Signed-off-by: Axel Lin <axel.lin@gmail.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Set addr_assign_type correctly to NET_ADDR_RANDOM in case
a random MAC address was generated and assigned to the netdevice.
Return state from setup_etheraddr() about returning a random
MAC address or not and check this state in eth_configure().
Signed-off-by: Danny Kukawka <danny.kukawka@bisect.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove duplicated at91sam9xxxx_idle() functions introduced
by commit c9dfafb "ARM: mach-at91: move special idle code out of line".
Replace by a generic at91sam9_idle() function in setup.c common
location.
Signed-off-by: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
The conversion of the ktime to a value suitable for the clock comparator
does not take changes to wall_to_monotonic into account. In fact the
conversion just needs the boot clock (sched_clock_base_cc) and the
total_sleep_time.
This is applicable to 3.2+ kernels.
CC: stable@vger.kernel.org
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The page_table_free_pgste function is used for kvm processes to free page
tables that have the pgste extension. It calls pgtable_page_ctor instead of
pgtable_page_dtor which increases NR_PAGETABLE instead of decreasing it.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Avoid calling wake_up() from our NMI "bottom halve" from RCU extended
quiescent state in idle. wake_up() has RCU read-side critical sections
but this will be completely ignored by RCU if the cpu is in extended
quiescent state.
Which means that whatever object is being accessed from within the
read-side critical section can be freed concurrently from a different
cpu.
So make sure we leave extended quiescent state before calling wake_up().
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Make the uprobes code readable to me:
- improve the Kconfig text so that a mere mortal gets some idea
what CONFIG_UPROBES=y is really about
- do trivial renames to standardize around the uprobes_*() namespace
- clean up and simplify various code flow details
- separate basic blocks of functionality
- line break artifact and white space related removal
- use standard local varible definition blocks
- use vertical spacing to make things more readable
- remove unnecessary volatile
- restructure comment blocks to make them more uniform and
more readable in general
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Jim Keniston <jkenisto@us.ibm.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Anton Arapov <anton@redhat.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Link: http://lkml.kernel.org/n/tip-ewbwhb8o6navvllsauu7k07p@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
It takes a couple of instructions to actually configure a clock event
so setting an alarm just 1 clock cycle in the future isn't going to work;
doing so results in setting an alarm in the "past" in which case the event
won't fire until the timer overflows and rolls back around to the "current
time".
Not quite sure how many clock cycles it actually takes to get through to
actually writing the register, but 100 seems to work reliably.
Use generic helper to set up the clock event while we're at it.
Reported-by: Jan Schulte <jan.schulte@aacmicrotec.com>
Signed-off-by: Jonas Bonn <jonas@southpole.se>
Add uprobes support to the core kernel, with x86 support.
This commit adds the kernel facilities, the actual uprobes
user-space ABI and perf probe support comes in later commits.
General design:
Uprobes are maintained in an rb-tree indexed by inode and offset
(the offset here is from the start of the mapping). For a unique
(inode, offset) tuple, there can be at most one uprobe in the
rb-tree.
Since the (inode, offset) tuple identifies a unique uprobe, more
than one user may be interested in the same uprobe. This provides
the ability to connect multiple 'consumers' to the same uprobe.
Each consumer defines a handler and a filter (optional). The
'handler' is run every time the uprobe is hit, if it matches the
'filter' criteria.
The first consumer of a uprobe causes the breakpoint to be
inserted at the specified address and subsequent consumers are
appended to this list. On subsequent probes, the consumer gets
appended to the existing list of consumers. The breakpoint is
removed when the last consumer unregisters. For all other
unregisterations, the consumer is removed from the list of
consumers.
Given a inode, we get a list of the mms that have mapped the
inode. Do the actual registration if mm maps the page where a
probe needs to be inserted/removed.
We use a temporary list to walk through the vmas that map the
inode.
- The number of maps that map the inode, is not known before we
walk the rmap and keeps changing.
- extending vm_area_struct wasn't recommended, it's a
size-critical data structure.
- There can be more than one maps of the inode in the same mm.
We add callbacks to the mmap methods to keep an eye on text vmas
that are of interest to uprobes. When a vma of interest is mapped,
we insert the breakpoint at the right address.
Uprobe works by replacing the instruction at the address defined
by (inode, offset) with the arch specific breakpoint
instruction. We save a copy of the original instruction at the
uprobed address.
This is needed for:
a. executing the instruction out-of-line (xol).
b. instruction analysis for any subsequent fixups.
c. restoring the instruction back when the uprobe is unregistered.
We insert or delete a breakpoint instruction, and this
breakpoint instruction is assumed to be the smallest instruction
available on the platform. For fixed size instruction platforms
this is trivially true, for variable size instruction platforms
the breakpoint instruction is typically the smallest (often a
single byte).
Writing the instruction is done by COWing the page and changing
the instruction during the copy, this even though most platforms
allow atomic writes of the breakpoint instruction. This also
mirrors the behaviour of a ptrace() memory write to a PRIVATE
file map.
The core worker is derived from KSM's replace_page() logic.
In essence, similar to KSM:
a. allocate a new page and copy over contents of the page that
has the uprobed vaddr
b. modify the copy and insert the breakpoint at the required
address
c. switch the original page with the copy containing the
breakpoint
d. flush page tables.
replace_page() is being replicated here because of some minor
changes in the type of pages and also because Hugh Dickins had
plans to improve replace_page() for KSM specific work.
Instruction analysis on x86 is based on instruction decoder and
determines if an instruction can be probed and determines the
necessary fixups after singlestep. Instruction analysis is done
at probe insertion time so that we avoid having to repeat the
same analysis every time a probe is hit.
A lot of code here is due to the improvement/suggestions/inputs
from Peter Zijlstra.
Changelog:
(v10):
- Add code to clear REX.B prefix as suggested by Denys Vlasenko
and Masami Hiramatsu.
(v9):
- Use insn_offset_modrm as suggested by Masami Hiramatsu.
(v7):
Handle comments from Peter Zijlstra:
- Dont take reference to inode. (expect inode to uprobe_register to be sane).
- Use PTR_ERR to set the return value.
- No need to take reference to inode.
- use PTR_ERR to return error value.
- register and uprobe_unregister share code.
(v5):
- Modified del_consumer as per comments from Peter.
- Drop reference to inode before dropping reference to uprobe.
- Use i_size_read(inode) instead of inode->i_size.
- Ensure uprobe->consumers is NULL, before __uprobe_unregister() is called.
- Includes errno.h as recommended by Stephen Rothwell to fix a build issue
on sparc defconfig
- Remove restrictions while unregistering.
- Earlier code leaked inode references under some conditions while
registering/unregistering.
- Continue the vma-rmap walk even if the intermediate vma doesnt
meet the requirements.
- Validate the vma found by find_vma before inserting/removing the
breakpoint
- Call del_consumer under mutex_lock.
- Use hash locks.
- Handle mremap.
- Introduce find_least_offset_node() instead of close match logic in
find_uprobe
- Uprobes no more depends on MM_OWNER; No reference to task_structs
while inserting/removing a probe.
- Uses read_mapping_page instead of grab_cache_page so that the pages
have valid content.
- pass NULL to get_user_pages for the task parameter.
- call SetPageUptodate on the new page allocated in write_opcode.
- fix leaking a reference to the new page under certain conditions.
- Include Instruction Decoder if Uprobes gets defined.
- Remove const attributes for instruction prefix arrays.
- Uses mm_context to know if the application is 32 bit.
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Also-written-by: Jim Keniston <jkenisto@us.ibm.com>
Reviewed-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Roland McGrath <roland@hack.frob.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Anton Arapov <anton@redhat.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Denys Vlasenko <vda.linux@googlemail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linux-mm <linux-mm@kvack.org>
Link: http://lkml.kernel.org/r/20120209092642.GE16600@linux.vnet.ibm.com
[ Made various small edits to the commit log ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
arch/openrisc/include/asm/uaccess.h: included 'linux/thread_info.h'
twice, remove the duplicate.
Signed-off-by: Danny Kukawka <danny.kukawka@bisect.de>
Signed-off-by: Jonas Bonn <jonas@southpole.se>
As described in e6fa16ab ("signal: sigprocmask() should do
retarget_shared_pending()") the modification of current->blocked is
incorrect as we need to check whether the signal we're about to block
is pending in the shared queue.
Also, use the new helper function introduced in commit 5e6292c0f2
("signal: add block_sigmask() for adding sigmask to current->blocked")
which centralises the code for updating current->blocked after
successfully delivering a signal and reduces the amount of duplicate
code across architectures. In the past some architectures got this
code wrong, so using this helper function should stop that from
happening again.
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: linux@lists.openrisc.net
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: Jonas Bonn <jonas@southpole.se>
get_signal_to_deliver() already resets the signal handler if
SA_ONESHOT is set in ka->sa.sa_flags, there's no need to do it again
in handle_signal(). Furthermore, because we were modifying
ka->sa.sa_handler (which is a copy of sighand->action[]) instead of
sighand->action[] the original code actually had no effect on signal
delivery.
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: linux@lists.openrisc.net
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: Jonas Bonn <jonas@southpole.se>
Instead of open coding the sequence from force_sigsegv() just call
it. This also fixes a bug because we were modifying ka->sa.sa_handler
(which is a copy of sighand->action[]), whereas the intention of the
code was to modify sighand->action[] directly.
As the original code was working with a copy it had no effect on
signal delivery.
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: linux@lists.openrisc.net
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: Jonas Bonn <jonas@southpole.se>
This patch enables passing a fdt pointer to the kernel.
This makes for the kernel parameter API:
void kernel(unsigned int fdt);
which, in accordance with the OpenRISC ABI results in:
r3 = pointer to fdt
Signed-off-by: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
Signed-off-by: Jonas Bonn <jonas@southpole.se>
Use BUG_ON(x) rather than if(x) BUG();
The semantic patch that fixes this problem is as follows:
(http://coccinelle.lip6.fr/)
// <smpl>
@@ identifier x; @@
-if (x) BUG();
+BUG_ON(x);
@@ identifier x; @@
-if (!x) BUG();
+BUG_ON(!x);
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Jonas Bonn <jonas@southpole.se>
Commits d7e7528bcd and
b05d8447e7 simplified the usage of the
audit_syscall_[entry|exit] functions. Unfortunately, the OpenRISC
architecture didn't get fixed up along with the other architectures when
those patches were pushed. This makes the relevant changes to this
architecture.
Signed-off-by: Jonas Bonn <jonas@southpole.se>
The AMD K7/K8 CPUs don't save/restore FDP/FIP/FOP unless an exception is
pending. In order to not leak FIP state from one process to another, we
need to do a floating point load after the fxsave of the old process,
and before the fxrstor of the new FPU state. That resets the state to
the (uninteresting) kernel load, rather than some potentially sensitive
user information.
We used to do this directly after the FPU state save, but that is
actually very inconvenient, since it
(a) corrupts what is potentially perfectly good FPU state that we might
want to lazy avoid restoring later and
(b) on x86-64 it resulted in a very annoying ordering constraint, where
"__unlazy_fpu()" in the task switch needs to be delayed until after
the DS segment has been reloaded just to get the new DS value.
Coupling it to the fxrstor instead of the fxsave automatically avoids
both of these issues, and also ensures that we only do it when actually
necessary (the FP state after a save may never actually get used). It's
simply a much more natural place for the leaked state cleanup.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>