We don't need to flush large pages by PAGE_SIZE step, that just waste
time. and actually, large page don't need 'invlpg' optimizing according
to our micro benchmark. So, just flush whole TLB is enough for them.
The following result is tested on a 2CPU * 4cores * 2HT NHM EP machine,
with THP 'always' setting.
Multi-thread testing, '-t' paramter is thread number:
without this patch with this patch
./mprotect -t 1 14ns 13ns
./mprotect -t 2 13ns 13ns
./mprotect -t 4 12ns 11ns
./mprotect -t 8 14ns 10ns
./mprotect -t 16 28ns 28ns
./mprotect -t 32 54ns 52ns
./mprotect -t 128 200ns 200ns
Signed-off-by: Alex Shi <alex.shi@intel.com>
Link: http://lkml.kernel.org/r/1340845344-27557-4-git-send-email-alex.shi@intel.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
x86 has no flush_tlb_range support in instruction level. Currently the
flush_tlb_range just implemented by flushing all page table. That is not
the best solution for all scenarios. In fact, if we just use 'invlpg' to
flush few lines from TLB, we can get the performance gain from later
remain TLB lines accessing.
But the 'invlpg' instruction costs much of time. Its execution time can
compete with cr3 rewriting, and even a bit more on SNB CPU.
So, on a 512 4KB TLB entries CPU, the balance points is at:
(512 - X) * 100ns(assumed TLB refill cost) =
X(TLB flush entries) * 100ns(assumed invlpg cost)
Here, X is 256, that is 1/2 of 512 entries.
But with the mysterious CPU pre-fetcher and page miss handler Unit, the
assumed TLB refill cost is far lower then 100ns in sequential access. And
2 HT siblings in one core makes the memory access more faster if they are
accessing the same memory. So, in the patch, I just do the change when
the target entries is less than 1/16 of whole active tlb entries.
Actually, I have no data support for the percentage '1/16', so any
suggestions are welcomed.
As to hugetlb, guess due to smaller page table, and smaller active TLB
entries, I didn't see benefit via my benchmark, so no optimizing now.
My micro benchmark show in ideal scenarios, the performance improves 70
percent in reading. And in worst scenario, the reading/writing
performance is similar with unpatched 3.4-rc4 kernel.
Here is the reading data on my 2P * 4cores *HT NHM EP machine, with THP
'always':
multi thread testing, '-t' paramter is thread number:
with patch unpatched 3.4-rc4
./mprotect -t 1 14ns 24ns
./mprotect -t 2 13ns 22ns
./mprotect -t 4 12ns 19ns
./mprotect -t 8 14ns 16ns
./mprotect -t 16 28ns 26ns
./mprotect -t 32 54ns 51ns
./mprotect -t 128 200ns 199ns
Single process with sequencial flushing and memory accessing:
with patch unpatched 3.4-rc4
./mprotect 7ns 11ns
./mprotect -p 4096 -l 8 -n 10240
21ns 21ns
[ hpa: http://lkml.kernel.org/r/1B4B44D9196EFF41AE41FDA404FC0A100BFF94@SHSMSX101.ccr.corp.intel.com
has additional performance numbers. ]
Signed-off-by: Alex Shi <alex.shi@intel.com>
Link: http://lkml.kernel.org/r/1340845344-27557-3-git-send-email-alex.shi@intel.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
For 4KB pages, x86 CPU has 2 or 1 level TLB, first level is data TLB and
instruction TLB, second level is shared TLB for both data and instructions.
For hupe page TLB, usually there is just one level and seperated by 2MB/4MB
and 1GB.
Although each levels TLB size is important for performance tuning, but for
genernal and rude optimizing, last level TLB entry number is suitable. And
in fact, last level TLB always has the biggest entry number.
This patch will get the biggest TLB entry number and use it in furture TLB
optimizing.
Accroding Borislav's suggestion, except tlb_ll[i/d]_* array, other
function and data will be released after system boot up.
For all kinds of x86 vendor friendly, vendor specific code was moved to its
specific files.
Signed-off-by: Alex Shi <alex.shi@intel.com>
Link: http://lkml.kernel.org/r/1340845344-27557-2-git-send-email-alex.shi@intel.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Add hwmod support for the EMAC (and MDIO)
ethernet controller that's on the am35x
family of SoC's.
Signed-off-by: Mark A. Greer <mgreer@animalcreek.com>
[paul@pwsan.com: updated subject line; updated to apply on v3.5-rc4;
added comments to hwmod data regarding IPSS]
Signed-off-by: Paul Walmsley <paul@pwsan.com>
During kernel init, the AM3505/AM3517 UART4 cannot complete its softreset:
omap_hwmod: uart4: softreset failed (waited 10000 usec)
This also results in another warning later in the boot process:
omap_hwmod: uart4: enabled state can only be entered from initialized, idle, or disabled state
From empirical observation, the AM35xx UART4 IP block requires either
uart1_fck or uart2_fck to be enabled while UART4 resets. Otherwise
the reset will never complete. So this patch adds uart1_fck as an
optional clock for UART4 and adds the appropriate hwmod flag to cause
uart1_fck to be enabled during the reset process. (The choice of
uart1_fck over uart2_fck was arbitrary.)
Unfortunately this observation raises many questions. Is it necessary
for uart1_fck or uart2_fck to be controlled with uart4_fck for the
UART4 to work correctly? What exactly do the AM35xx UART4 clock
tree and the related PRCM idle management FSMs look like? If anyone
has the ability to answer these questions through empirical functional
testing, or hardware information from the AM35xx designers, it would
be greatly appreciated.
Cc: Benoît Cousson <b-cousson@ti.com>
Cc: Kyle Manna <kyle.manna@fuel7.com>
Cc: Mark A. Greer <mgreer@animalcreek.com>
Cc: Ranjith Lohithakshan <ranjithl@ti.com>
Signed-off-by: Paul Walmsley <paul@pwsan.com>
Tested-by: Mark A. Greer <mgreer@animalcreek.com>
Add missing terminators to the arrays of IRQ, DMA, and address space
structure records in the AM35xx UART4 hwmod data. Without these
terminators, the following warnings appear on boot:
omap_uart.3: failed to claim resource 58
omap_device: omap_uart: build failed (-16)
WARNING: at /home/paul/linux/arch/arm/mach-omap2/serial.c:375 omap_serial_init_port+0x198/0x284()
Could not build omap_device for omap_uart: uart4.
Also, AM35xx uart4_fck has an incorrect parent clock pointer. Fix it
and clean up a whitespace issue.
Fix some incorrectly-named macros related to AM35xx UART4.
Cc: Kyle Manna <kyle.manna@fuel7.com>
Cc: Mark A. Greer <mgreer@animalcreek.com>
Cc: Ranjith Lohithakshan <ranjithl@ti.com>
Signed-off-by: Paul Walmsley <paul@pwsan.com>
Tested-by: Mark A. Greer <mgreer@animalcreek.com>
Partially fix the hwmod data for the AM35xx USB OTG hwmod. This
should resolve the following boot warning on AM35xx platforms:
omap_hwmod: am35x_otg_hs: cannot be enabled for reset (3)
While here, also fix the clkdev records, to avoid warnings about
duplicate clock aliases.
The hwmod is also connected to the wrong interconnect. It should be
connected to the IPSS, not the L4 CORE. But that is left for a future
fix, since it probably has a dependency on some hwmod core changes.
Cc: Felipe Balbi <balbi@ti.com>
Cc: Hema HK <hemahk@ti.com>
Cc: Mark A. Greer <mgreer@animalcreek.com>
Signed-off-by: Paul Walmsley <paul@pwsan.com>
Tested-by: Mark A. Greer <mgreer@animalcreek.com>
Pull a m68knommu fix from Greg Ungerer:
"It contains a single fix for breakage using the Freescale FEC eth
driver on ColdFire CPUs."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
m68knommu: define a local devm_clk_get() function
Fix problem that was introduced with patch "s390/smp: make absolute
lowcore / cpu restart parameter". After that patch the "dumpreipl"
shutdown action does not work any more. To fix the problem we have
to assign "reipl_block_actual" instead of "&reipl_block_actual"
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
OMAP4470 currently fails to boot, printing various messages such as ...
omap_hwmod: mpu: cannot clk_get main_clk dpll_mpu_m2_ck
omap_hwmod: mpu: cannot _init_clocks
------------[ cut here ]------------
WARNING: at arch/arm/mach-omap2/omap_hwmod.c:2062 _init+0x2a0/0x2e4()
omap_hwmod: mpu: couldn't init clocks
Modules linked in:
[<c001c7fc>] (unwind_backtrace+0x0/0xf4) from [<c0043c64>] (warn_slowpath_common+0x4c/0x64)
[<c0043c64>] (warn_slowpath_common+0x4c/0x64) from [<c0043d10>] (warn_slowpath_fmt+0x30/0x40)
[<c0043d10>] (warn_slowpath_fmt+0x30/0x40) from [<c0674208>] (_init+0x2a0/0x2e4)
[<c0674208>] (_init+0x2a0/0x2e4) from [<c067428c>] (omap_hwmod_setup_one+0x40/0x60)
[<c067428c>] (omap_hwmod_setup_one+0x40/0x60) from [<c0674280>] (omap_hwmod_setup_one+0x34/0x60)
[<c0674280>] (omap_hwmod_setup_one+0x34/0x60) from [<c06726f4>] (omap_dm_timer_init_one+0x30/0x250)
[<c06726f4>] (omap_dm_timer_init_one+0x30/0x250) from [<c0672930>] (omap2_gp_clockevent_init+0x1c/0x108)
[<c0672930>] (omap2_gp_clockevent_init+0x1c/0x108) from [<c0672c60>] (omap4_timer_init+0x10/0x5c)
[<c0672c60>] (omap4_timer_init+0x10/0x5c) from [<c066c418>] (time_init+0x20/0x30)
[<c066c418>] (time_init+0x20/0x30) from [<c0668814>] (start_kernel+0x1b0/0x304)
[<c0668814>] (start_kernel+0x1b0/0x304) from [<80008044>] (0x80008044)
---[ end trace 1b75b31a2719ed1c ]---
The problem is that currently none of the clocks are being registered for
OMAP4470 devices and so on boot-up no clocks can be found and the kernel panics.
This fix allows the kernel to boot without failure using a simple RAMDISK file
system on OMAP4470 blaze board.
Per feedback from Paul and Benoit the 4470 clock data is incomplete for new
modules such as the 2D graphics block that has been added to the 4470.
Therefore add a warning to indicate that the clock data is incomplete.
Cc: Paul Walmsley <paul@pwsan.com>
Cc: Benoit Cousson <b-cousson@ti.com>
Signed-off-by: Jon Hunter <jon-hunter@ti.com>
[tony@atomide.com: updated comments]
Signed-off-by: Tony Lindgren <tony@atomide.com>
Add support for memory controller on Calxeda Highbank platforms. Highbank
platforms support a single 4GB mini-DIMM with 1-bit correction and 2-bit
detection.
Signed-off-by: Rob Herring <rob.herring@calxeda.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
prom_meminit() and prom_ranges_init() are declared in <asm/oplib_32.h>.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
pcibios_assign_resource() isn't used anywhere; remove it.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
m68k's asm/shm.h header has been part of the tree ever since m68k
support got added in v1.3.94. (It started as /include/asm-m68k/shm.h and
moved to its current location a few years ago.) It seems it was never
used: no file ever included it and nothing used the macros it defines.
(Actually, from v2.5.46 until v2.6.29-rc3 it was included by m68knommu's
asm/shm.h. But that header was just a very thin wrapper for this header
and was itself unused too.)
This header can safely be removed.
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
This was copied from SPARC, but isn't relevant for the supported Sun-3
models.
[Geert] Also remove the related extern declarations, and update the
comment about prom_init().
Reported-by: Sarah Nadi <snadi@uwaterloo.ca>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
BSS_SECTION() provides the __bss_{start,stop} symbols, so there's no need
to wrap our own _[se]bss around it.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Greg Ungerer<gerg@uclinux.org>
The standard (see BSS_SECTION() in <asm-generic/vmlinux.lds.h> and
<asm-generic/sections.h>) symbol for the end of BSS is __bss_stop.
This allows to remove all local declarations that have been added to
several architectures just to please CONFIG_MTD_UCLINUX.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Acked-by: Michal Simek <monstr@monstr.eu>
Acked-by: Greg Ungerer <gerg@uclinux.org>
All of the current Linux supported ColdFire CPUs handle unaligned
memory accesses. So remove the CONFIG_CPU_HAS_NO_UNALIGNED option
selection for ColdFire. If we ever support a specific ColdFire CPU
that does not support unaligned accesses then we can insert the
CONFIG_CPU_HAS_NO_UNALIGNED for that specific CPU type.
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
There are five entirely unused headers in arch/m68k/include/asm. Nothing
includes these headers. And a few tests found no hits for the things
they provide (which makes sense).
MC68332.h, mac_mouse.h, and mcfmbus.h are all unused since at least
v2.6.12-rc2 (I didn't bother looking further back than that).
apollodma.h is unused since v2.6.19: commit
2ed0ce5b57 ("m68k/Apollo: Remove obsolete
arch/m68k/apollo/dma.c") removed the last file interested in that
header.
And everything interested in <asm/sbus.h> was removed in the v2.6.28
release cycle. The last occurrence of "sbus.h" was deleted with commit
0c0db98b50 ("sparc: Remove
Documentation/sparc/sbus_drivers.txt"). I'm not sure whether anything
relevant for m68k was included in v2.6.27, but it doesn't really matter.
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Acked-by: Greg Ungerer<gerg@uclinux.org>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Now that serpent-sse2 glue code has been made generic, it can be split to
separate module.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Block cipher implementations in arch/x86/crypto/ contain common glue code that
is currently duplicated in each module (camellia-x86_64, twofish-x86_64-3way,
twofish-avx, serpent-sse2 and serpent-avx). This patch prepares serpent-sse2
glue into generic glue code for all 128bit block ciphers to use in
arch/x86/crypto.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Move ablk-* functions to separate module to share common code between cipher
implementations.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
When the nx driver was pulled, the Makefile that actually
builds it is arch/powerpc/Makefile. This is unnatural.
This patch moves the line that builds the nx driver from
arch/powerpc/Makefile to drivers/crypto/Makefile where it
belongs.
Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
Acked-by: Kent Yoder <key@linux.vnet.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
OMAP3, OMAP4 and AM33xx share some common data like, clksel_rate
oscillator clock input (Virtual clock nodes), required for
clock tree; so move common data to common data file so that it
can be reused.
[hvaibhav@ti.com: Created separate commit from Paul's developement
branch]
Signed-off-by: Vaibhav Hiremath <hvaibhav@ti.com>
Signed-off-by: Paul Walmsley <paul@pwsan.com>
an oops on boot for all omap3xxx platforms that use usbhs_omap for
EHCI. The actual oops comes from faulty ehci-omap cleanup, but the
failure caused by the change is evidenced here:
[ 3.655059] ehci-omap ehci-omap.0: utmi_p1_gfclk failed error:-2
[ 3.661376] ehci-omap: probe of ehci-omap.0 failed with error -2
utmi_p1_gfclk is a clock that exists on OMAP4, but not OMAP3. In the
OMAP3 case, it is configured as a dummy clock. However, OMAP4 lists
the dev_id as NULL, but OMAP3 lists it as "usbhs_omap".
Attempting to get that clock from ehci-omap then fails. The solution
is to just change the clock3xxx_data.c for dummy clocks used in the
errata fix to match the dev_id, NULL, used in clock44xx_data.c.
Tested on BB-xM.
Signed-off-by: Russ Dill <Russ.Dill@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fixes the following build warning:
arch/arm/plat-samsung/dma-ops.c:129:2: warning: initialization from incompatible pointer type [enabled by default]
arch/arm/plat-samsung/dma-ops.c:129:2: warning: (near initialization for 'dmadev_ops.release') [enabled by default]
Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
Commit 20ef9e08 ("ARM: EXYNOS: Support DMA for EXYNOS5250 SoC")
renamed EXYNOS4_DEV_DMA to EXYNOS_DEV_DMA. But some machine entries
still had EXYNOS4_DEV_DMA. Changed them to EXYNOS_DEV_DMA.
Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
Since SYSRAM set the L2 cache latency on EXYNOS5 SoCs,
no longer need that in the kernel. It helps to reduce
booting time (no need cache disable and cache enable).
Signed-off-by: Boojin Kim <boojin.kim@samsung.com>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
Rename the driver name of the clock entry of Tegra APBDMA to
tegra-apbdma from of tegra-dma.
This name is more aligned towards the movement of dmaengine based
new DMA driver.
Signed-off-by: Laxman Dewangan <ldewangan@nvidia.com>
Signed-off-by: Stephen Warren <swarren@nvidia.com>
The PCI core provides a generic pcibios_setup() routine. Drop this
architecture-specific version in favor of that.
Acked-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Myron Stowe <myron.stowe@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
commit 8259573b (ARM: OMAP2+: nand: Make board_onenand_init() visible
to board code) broke the build for configs with OneNAND disabled. By
removing the static in the header file, it created a duplicate definition
in the .c and the .h files, resuling in a build error:
/work/kernel/omap/dev/arch/arm/mach-omap2/board-flash.c:102:111: error: redefinition of 'board_onenand_init'
/work/kernel/omap/dev/arch/arm/mach-omap2/board-flash.h:56:51: note: previous definition of 'board_onenand_init' was here
make[2]: *** [arch/arm/mach-omap2/board-flash.o] Error 1
make[2]: *** Waiting for unfinished jobs....
make[1]: *** [arch/arm/mach-omap2] Error 2
make: *** [sub-make] Error 2
Fix this by removing the duplicate dummy entry from the C file.
Cc: Enric Balletbò i Serra <eballetbo@gmail.com>
Cc: Javier Martinez Canillas <javier@dowhile0.org>
Signed-off-by: Kevin Hilman <khilman@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>