1
0
Commit Graph

677432 Commits

Author SHA1 Message Date
Linus Torvalds
414975eb76 Merge tag 'befs-v4.12-rc1' of git://github.com/luisbg/linux-befs
Pull befs fix from Luis de Bethencourt:
 "One fix from Fabian Frederick making the nfs client still work after a
  cache drop"

* tag 'befs-v4.12-rc1' of git://github.com/luisbg/linux-befs:
  befs: make export work with cold dcache
2017-05-05 13:33:38 -07:00
Linus Torvalds
58017a3e28 Merge tag 'initramfs-fix-4.12-rc1' of git://github.com/stffrdhrn/linux
Pull initramfs fix from Stafford Horne:
 "This is a fix for an issue that has caused 4.11 to not boot on
  OpenRISC. I should have caught this during the 4.11 cycle but I had
  been busy on testing some other series of patches.

  I would have considered pushing it though a different path but Al Viro
  suggested submitting directly to you.

  Also, its just one as I havent really got anything else ready on my
  queue for 4.12.

  Summary:

   - Ensure fput() flush is done even for builtin initramfs"

* tag 'initramfs-fix-4.12-rc1' of git://github.com/stffrdhrn/linux:
  initramfs: Always do fput() and load modules after rootfs populate
2017-05-05 13:30:11 -07:00
Fabian Frederick
6b4657667b fs/affs: add rename exchange
Process RENAME_EXCHANGE in affs_rename2() adding static
affs_xrename() based on affs_rename().

We remove headers from respective directories then
affect bh to other inode directory entries for swapping.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-05-05 15:24:52 -04:00
Fabian Frederick
c6184028a7 fs/affs: add rename2 to prepare multiple methods
Currently AFFS only supports RENAME_NOREPLACE.
This patch isolates that method to a static function to
prepare RENAME_EXCHANGE addition.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-05-05 15:24:52 -04:00
Bob Peterson
ed17545d01 GFS2: Allow glocks to be unlocked after withdraw
This bug fixes a regression introduced by patch 0d1c7ae9d8.

The intent of the patch was to stop promoting glocks after a
file system is withdrawn due to a variety of errors, because doing
so results in a BUG(). (You should be able to unmount after a
withdraw rather than having the kernel panic.)

Unfortunately, it also stopped demotions, so glocks could not be
unlocked after withdraw, which means the unmount would hang.

This patch allows function do_xmote to demote locks to an
unlocked state after a withdraw, but not promote them.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-05-05 14:19:28 -05:00
Eryu Guan
161f55efba xfs: fix use-after-free in xfs_finish_page_writeback
Commit 28b783e47a ("xfs: bufferhead chains are invalid after
end_page_writeback") fixed one use-after-free issue by
pre-calculating the loop conditionals before calling bh->b_end_io()
in the end_io processing loop, but it assigned 'next' pointer before
checking end offset boundary & breaking the loop, at which point the
bh might be freed already, and caused use-after-free.

This is caught by KASAN when running fstests generic/127 on sub-page
block size XFS.

[ 2517.244502] run fstests generic/127 at 2017-04-27 07:30:50
[ 2747.868840] ==================================================================
[ 2747.876949] BUG: KASAN: use-after-free in xfs_destroy_ioend+0x3d3/0x4e0 [xfs] at addr ffff8801395ae698
...
[ 2747.918245] Call Trace:
[ 2747.920975]  dump_stack+0x63/0x84
[ 2747.924673]  kasan_object_err+0x21/0x70
[ 2747.928950]  kasan_report+0x271/0x530
[ 2747.933064]  ? xfs_destroy_ioend+0x3d3/0x4e0 [xfs]
[ 2747.938409]  ? end_page_writeback+0xce/0x110
[ 2747.943171]  __asan_report_load8_noabort+0x19/0x20
[ 2747.948545]  xfs_destroy_ioend+0x3d3/0x4e0 [xfs]
[ 2747.953724]  xfs_end_io+0x1af/0x2b0 [xfs]
[ 2747.958197]  process_one_work+0x5ff/0x1000
[ 2747.962766]  worker_thread+0xe4/0x10e0
[ 2747.966946]  kthread+0x2d3/0x3d0
[ 2747.970546]  ? process_one_work+0x1000/0x1000
[ 2747.975405]  ? kthread_create_on_node+0xc0/0xc0
[ 2747.980457]  ? syscall_return_slowpath+0xe6/0x140
[ 2747.985706]  ? do_page_fault+0x30/0x80
[ 2747.989887]  ret_from_fork+0x2c/0x40
[ 2747.993874] Object at ffff8801395ae690, in cache buffer_head size: 104
[ 2748.001155] Allocated:
[ 2748.003782] PID = 8327
[ 2748.006411]  save_stack_trace+0x1b/0x20
[ 2748.010688]  save_stack+0x46/0xd0
[ 2748.014383]  kasan_kmalloc+0xad/0xe0
[ 2748.018370]  kasan_slab_alloc+0x12/0x20
[ 2748.022648]  kmem_cache_alloc+0xb8/0x1b0
[ 2748.027024]  alloc_buffer_head+0x22/0xc0
[ 2748.031399]  alloc_page_buffers+0xd1/0x250
[ 2748.035968]  create_empty_buffers+0x30/0x410
[ 2748.040730]  create_page_buffers+0x120/0x1b0
[ 2748.045493]  __block_write_begin_int+0x17a/0x1800
[ 2748.050740]  iomap_write_begin+0x100/0x2f0
[ 2748.055308]  iomap_zero_range_actor+0x253/0x5c0
[ 2748.060362]  iomap_apply+0x157/0x270
[ 2748.064347]  iomap_zero_range+0x5a/0x80
[ 2748.068624]  iomap_truncate_page+0x6b/0xa0
[ 2748.073227]  xfs_setattr_size+0x1f7/0xa10 [xfs]
[ 2748.078312]  xfs_vn_setattr_size+0x68/0x140 [xfs]
[ 2748.083589]  xfs_file_fallocate+0x4ac/0x820 [xfs]
[ 2748.088838]  vfs_fallocate+0x2cf/0x780
[ 2748.093021]  SyS_fallocate+0x48/0x80
[ 2748.097006]  do_syscall_64+0x18a/0x430
[ 2748.101186]  return_from_SYSCALL_64+0x0/0x6a
[ 2748.105948] Freed:
[ 2748.108189] PID = 8327
[ 2748.110816]  save_stack_trace+0x1b/0x20
[ 2748.115093]  save_stack+0x46/0xd0
[ 2748.118788]  kasan_slab_free+0x73/0xc0
[ 2748.122969]  kmem_cache_free+0x7a/0x200
[ 2748.127247]  free_buffer_head+0x41/0x80
[ 2748.131524]  try_to_free_buffers+0x178/0x250
[ 2748.136316]  xfs_vm_releasepage+0x2e9/0x3d0 [xfs]
[ 2748.141563]  try_to_release_page+0x100/0x180
[ 2748.146325]  invalidate_inode_pages2_range+0x7da/0xcf0
[ 2748.152087]  xfs_shift_file_space+0x37d/0x6e0 [xfs]
[ 2748.157557]  xfs_collapse_file_space+0x49/0x120 [xfs]
[ 2748.163223]  xfs_file_fallocate+0x2a7/0x820 [xfs]
[ 2748.168462]  vfs_fallocate+0x2cf/0x780
[ 2748.172642]  SyS_fallocate+0x48/0x80
[ 2748.176629]  do_syscall_64+0x18a/0x430
[ 2748.180810]  return_from_SYSCALL_64+0x0/0x6a

Fixed it by checking on offset against end & breaking out first,
dereference bh only if there're still bufferheads to process.

Signed-off-by: Eryu Guan <eguan@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-05-05 12:16:48 -07:00
Linus Torvalds
ab182e67ec Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Catalin Marinas:

 - kdump support, including two necessary memblock additions:
   memblock_clear_nomap() and memblock_cap_memory_range()

 - ARMv8.3 HWCAP bits for JavaScript conversion instructions, complex
   numbers and weaker release consistency

 - arm64 ACPI platform MSI support

 - arm perf updates: ACPI PMU support, L3 cache PMU in some Qualcomm
   SoCs, Cortex-A53 L2 cache events and DTLB refills, MAINTAINERS update
   for DT perf bindings

 - architected timer errata framework (the arch/arm64 changes only)

 - support for DMA_ATTR_FORCE_CONTIGUOUS in the arm64 iommu DMA API

 - arm64 KVM refactoring to use common system register definitions

 - remove support for ASID-tagged VIVT I-cache (no ARMv8 implementation
   using it and deprecated in the architecture) together with some
   I-cache handling clean-up

 - PE/COFF EFI header clean-up/hardening

 - define BUG() instruction without CONFIG_BUG

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (92 commits)
  arm64: Fix the DMA mmap and get_sgtable API with DMA_ATTR_FORCE_CONTIGUOUS
  arm64: Print DT machine model in setup_machine_fdt()
  arm64: pmu: Wire-up Cortex A53 L2 cache events and DTLB refills
  arm64: module: split core and init PLT sections
  arm64: pmuv3: handle pmuv3+
  arm64: Add CNTFRQ_EL0 trap handler
  arm64: Silence spurious kbuild warning on menuconfig
  arm64: pmuv3: use arm_pmu ACPI framework
  arm64: pmuv3: handle !PMUv3 when probing
  drivers/perf: arm_pmu: add ACPI framework
  arm64: add function to get a cpu's MADT GICC table
  drivers/perf: arm_pmu: split out platform device probe logic
  drivers/perf: arm_pmu: move irq request/free into probe
  drivers/perf: arm_pmu: split cpu-local irq request/free
  drivers/perf: arm_pmu: rename irq request/free functions
  drivers/perf: arm_pmu: handle no platform_device
  drivers/perf: arm_pmu: simplify cpu_pmu_request_irqs()
  drivers/perf: arm_pmu: factor out pmu registration
  drivers/perf: arm_pmu: fold init into alloc
  drivers/perf: arm_pmu: define armpmu_init_fn
  ...
2017-05-05 12:11:37 -07:00
Vineet Gupta
868a65307d ARC: mm: fix build failure in linux-next for UP builds
kisskb build service reported ARC defconfig build failures in linux-next

| arch/arc/include/asm/mmu.h:75:21: error: 'NR_CPUS' undeclared here (not in a function)
| make[3]: *** [arch/arc/mm/ioremap.o] Error 1
| make[2]: *** [arch/arc/mm] Error 2
| make[1]: *** [arch/arc] Error 2

which I bisected to a subtle side-effect of a totally benign mm patch
("mm, vmalloc: properly track vmalloc users") which caused a header
include chain deviation - asm/mmu.h using NR_CPUS before including
linux/threads.h

Fix that by adding the dependnet header and while it at fix a related
header to include linux headers aheads of asm headers as sometimes that
slso triggers such issues !

Reported-by: noreply@ellerman.id.au
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2017-05-05 12:06:33 -07:00
Rakesh Pandit
2c041afc5a net: alx: handle pci_alloc_irq_vectors return correctly
It was introduced while switching to pci_alloc_irq_vectors recently
and fixes:

[   60.527052] alx 0000:03:00.0 enp3s0: Enabling MSI-X interrupts failed!
[   60.529323] BUG: unable to handle kernel NULL pointer dereference at 00000000000000b8
[   60.531589] IP: alx_alloc_napis+0xe6/0x1e0 [alx]
[   60.533831] PGD 0
[   60.533833] P4D 0

[   60.540559] Oops: 0002 [#1] SMP
[   60.542759] Modules linked in: ebtables ip6table_filter ip6_tables.....
[   60.549990]  drm_kms_helper drm crc32c_intel alx serio_raw mdio wmi video i2c_hid uas usb_storage
[   60.551404] CPU: 0 PID: 999 Comm: NetworkManager Not tainted 4.11.0+ #1
[   60.552813] Hardware name: Acer Predator G9-591/Mustang_SLS, BIOS V1.10 03/03/2016
[   60.554219] task: ffff8804ae833c00 task.stack: ffffc90003eec000
[   60.555383] RIP: 0010:alx_alloc_napis+0xe6/0x1e0 [alx]
[   60.556615] RSP: 0018:ffffc90003eef660 EFLAGS: 00010286
[   60.557787] RAX: ffff8804962835a0 RBX: ffff8804aee8a8c0 RCX: 0000000000000000
[   60.558987] RDX: 0000000000000060 RSI: 0000000000000000 RDI: ffff880496283600
[   60.559979] RBP: ffffc90003eef688 R08: ffff8804c1c1e7e0 R09: ffff8804962835a0
[   60.560978] R10: ffff8804962835a0 R11: 0000000000000102 R12: 0000000000000000
[   60.561974] R13: 0000000000000000 R14: ffff8804aee8aaf0 R15: ffffffffa0052ea0
[   60.562974] FS:  00007f1cecbc9940(0000) GS:ffff8804c1c00000(0000) knlGS:0000000000000000
[   60.564003] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   60.564884] CR2: 00000000000000b8 CR3: 0000000496025000 CR4: 00000000003406f0
[   60.565782] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   60.566676] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   60.567560] Call Trace:
[   60.568500]  __alx_open+0xa2/0x570 [alx]
[   60.569355]  ? notifier_call_chain+0x4a/0x70
[   60.570144]  alx_open+0x17/0x20 [alx]
[   60.570909]  __dev_open+0xc6/0x140
[   60.571682]  ? _raw_spin_unlock_bh+0x1a/0x20
[   60.572469]  __dev_change_flags+0x9d/0x160
[   60.573252]  dev_change_flags+0x29/0x60
[   60.574070]  do_setlink+0x32e/0xc90
[   60.574815]  ? ttwu_do_activate+0x77/0x80
[   60.575544]  ? drm_fb_helper_dirty.isra.17+0xc7/0xe0 [drm_kms_helper]
[   60.576273]  ? drm_fb_helper_cfb_imageblit+0x30/0x40 [drm_kms_helper]
[   60.577004]  ? bit_putcs+0x2f7/0x560
[   60.577729]  ? nla_parse+0x35/0x140
[   60.578518]  rtnl_newlink+0x7d3/0x900
[   60.579280]  ? security_capset+0x30/0x80
[   60.580029]  ? ns_capable_common+0x68/0x80
[   60.580747]  ? ns_capable+0x13/0x20
[   60.581453]  rtnetlink_rcv_msg+0xee/0x220
[   60.582198]  ? rtnl_newlink+0x900/0x900
[   60.582909]  netlink_rcv_skb+0xe7/0x120
[   60.583601]  rtnetlink_rcv+0x28/0x30
[   60.584303]  netlink_unicast+0x18c/0x220
[   60.585002]  netlink_sendmsg+0x2ba/0x3b0
[   60.585703]  sock_sendmsg+0x38/0x50
[   60.586436]  ___sys_sendmsg+0x2b6/0x2d0
[   60.587123]  ? lockref_put_or_lock+0x5e/0x80
[   60.587822]  ? dput+0x155/0x1d0
[   60.588518]  ? mntput+0x24/0x40
[   60.589215]  __sys_sendmsg+0x54/0x90
[   60.589907]  ? __sys_sendmsg+0x54/0x90
[   60.590627]  SyS_sendmsg+0x12/0x20
[   60.591333]  entry_SYSCALL_64_fastpath+0x1a/0xa5
[   60.592021] RIP: 0033:0x7f1ceb44e3b0
[   60.592697] RSP: 002b:00007fffd7f0a2d0 EFLAGS: 00000293 ORIG_RAX: 000000000000002e
[   60.593385] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f1ceb44e3b0
[   60.594107] RDX: 0000000000000000 RSI: 00007fffd7f0a380 RDI: 000000000000000c
[   60.594798] RBP: 00007fffd7f0a800 R08: 0000000000000000 R09: 0000000000000000
[   60.595502] R10: 0000564ffbae6e20 R11: 0000000000000293 R12: 0000000000000001
[   60.596200] R13: 0000000000000002 R14: 0000000000000010 R15: 00007fffd7f0a4d0
[   60.596899] Code: ed 85 c9 0f 8f ec 00 00 00 48 8b 3d 9d 97 1a e2 ba 50 00 00 00 be c0 80 40 01 4c 8b a3 30 02 00 00 e8 ff e5 1d e1 48 85 c0 74 a3 <49> 89 84 24 b8 00 00 00 48 8b 93 30 02 00 00 48 8b 4b 08 48 89
[   60.597642] RIP: alx_alloc_napis+0xe6/0x1e0 [alx] RSP: ffffc90003eef660
[   60.598427] CR2: 00000000000000b8

Fixes: f3297f68 ("net: alx: switch to pci_alloc_irq_vectors")
Signed-off-by: Rakesh Pandit <rakesh@tuxera.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-05 14:52:03 -04:00
Mike Snitzer
10add84e27 dm cache metadata: fail operations if fail_io mode has been established
Otherwise it is possible to trigger crashes due to the metadata being
inaccessible yet these methods don't safely account for that possibility
without these checks.

Cc: stable@vger.kernel.org
Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-05-05 14:40:13 -04:00
Linus Torvalds
7246f60068 Merge tag 'powerpc-4.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
 "Highlights include:

   - Larger virtual address space on 64-bit server CPUs. By default we
     use a 128TB virtual address space, but a process can request access
     to the full 512TB by passing a hint to mmap().

   - Support for the new Power9 "XIVE" interrupt controller.

   - TLB flushing optimisations for the radix MMU on Power9.

   - Support for CAPI cards on Power9, using the "Coherent Accelerator
     Interface Architecture 2.0".

   - The ability to configure the mmap randomisation limits at build and
     runtime.

   - Several small fixes and cleanups to the kprobes code, as well as
     support for KPROBES_ON_FTRACE.

   - Major improvements to handling of system reset interrupts,
     correctly treating them as NMIs, giving them a dedicated stack and
     using a new hypervisor call to trigger them, all of which should
     aid debugging and robustness.

   - Many fixes and other minor enhancements.

  Thanks to: Alastair D'Silva, Alexey Kardashevskiy, Alistair Popple,
  Andrew Donnellan, Aneesh Kumar K.V, Anshuman Khandual, Anton
  Blanchard, Balbir Singh, Ben Hutchings, Benjamin Herrenschmidt,
  Bhupesh Sharma, Chris Packham, Christian Zigotzky, Christophe Leroy,
  Christophe Lombard, Daniel Axtens, David Gibson, Gautham R. Shenoy,
  Gavin Shan, Geert Uytterhoeven, Guilherme G. Piccoli, Hamish Martin,
  Hari Bathini, Kees Cook, Laurent Dufour, Madhavan Srinivasan, Mahesh J
  Salgaonkar, Mahesh Salgaonkar, Masami Hiramatsu, Matt Brown, Matthew
  R. Ochs, Michael Neuling, Naveen N. Rao, Nicholas Piggin, Oliver
  O'Halloran, Pan Xinhui, Paul Mackerras, Rashmica Gupta, Russell
  Currey, Sukadev Bhattiprolu, Thadeu Lima de Souza Cascardo, Tobin C.
  Harding, Tyrel Datwyler, Uma Krishnan, Vaibhav Jain, Vipin K Parashar,
  Yang Shi"

* tag 'powerpc-4.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (214 commits)
  powerpc/64s: Power9 has no LPCR[VRMASD] field so don't set it
  powerpc/powernv: Fix TCE kill on NVLink2
  powerpc/mm/radix: Drop support for CPUs without lockless tlbie
  powerpc/book3s/mce: Move add_taint() later in virtual mode
  powerpc/sysfs: Move #ifdef CONFIG_HOTPLUG_CPU out of the function body
  powerpc/smp: Document irq enable/disable after migrating IRQs
  powerpc/mpc52xx: Don't select user-visible RTAS_PROC
  powerpc/powernv: Document cxl dependency on special case in pnv_eeh_reset()
  powerpc/eeh: Clean up and document event handling functions
  powerpc/eeh: Avoid use after free in eeh_handle_special_event()
  cxl: Mask slice error interrupts after first occurrence
  cxl: Route eeh events to all drivers in cxl_pci_error_detected()
  cxl: Force context lock during EEH flow
  powerpc/64: Allow CONFIG_RELOCATABLE if COMPILE_TEST
  powerpc/xmon: Teach xmon oops about radix vectors
  powerpc/mm/hash: Fix off-by-one in comment about kernel contexts ids
  powerpc/pseries: Enable VFIO
  powerpc/powernv: Fix iommu table size calculation hook for small tables
  powerpc/powernv: Check kzalloc() return value in pnv_pci_table_alloc
  powerpc: Add arch/powerpc/tools directory
  ...
2017-05-05 11:36:44 -07:00
Linus Torvalds
e579dde654 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull namespace updates from Eric Biederman:
 "This is a set of small fixes that were mostly stumbled over during
  more significant development. This proc fix and the fix to
  posix-timers are the most significant of the lot.

  There is a lot of good development going on but unfortunately it
  didn't quite make the merge window"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  proc: Fix unbalanced hard link numbers
  signal: Make kill_proc_info static
  rlimit: Properly call security_task_setrlimit
  signal: Remove unused definition of sig_user_definied
  ia64: Remove unused IA64_TASK_SIGHAND_OFFSET and IA64_SIGHAND_SIGLOCK_OFFSET
  ipc: Remove unused declaration of recompute_msgmni
  posix-timers: Correct sanity check in posix_cpu_nsleep
  sysctl: Remove dead register_sysctl_root
2017-05-05 11:08:43 -07:00
Fabian Frederick
0795bf8357 nfs: use kmap/kunmap directly
This patch removes useless nfs_readdir_get_array() and
nfs_readdir_release_array() as suggested by Trond Myklebust

nfs_readdir() calls nfs_revalidate_mapping() before
readdir_search_pagecache() , nfs_do_filldir(), uncached_readdir()
so mapping should be correct.

While kmap() can't fail, all subsequent error checks were removed
as well as unused labels.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-05-05 13:01:33 -04:00
Hou Tao
59b86d85a7 NFS: always treat the invocation of nfs_getattr as cache hit when noac is on
When using 'ls -l' to display a large directory, if noac option is used,
in function nfs_getattr() nfs_need_revalidate_inode() will always be true
for NFSv3 and the nfs_entry cache of the directory will be flushed. The
flush will lead to a fully reread of the directory entries from server.

To prevent the unnecessary RPCs, we need to check whether or not the
noac option is used, and always report the invocation of nfs_getattr()
as cache hit instead cache miss when it's on.

Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-05-05 13:01:32 -04:00
Dave Wysochanski
5c737cb299 Fix nfs_client refcounting if kmalloc fails in nfs4_proc_exchange_id and nfs4_proc_async_renew
If memory allocation fails for the callback data, we need to put the nfs_client
or we end up with an elevated refcount.

Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-05-05 13:01:32 -04:00
Trond Myklebust
0048fdd066 NFSv4.1: RECLAIM_COMPLETE must handle NFS4ERR_CONN_NOT_BOUND_TO_SESSION
If the server returns NFS4ERR_CONN_NOT_BOUND_TO_SESSION because we
are trunking, then RECLAIM_COMPLETE must handle that by calling
nfs4_schedule_session_recovery() and then retrying.

Reported-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Tested-by: Chuck Lever <chuck.lever@oracle.com>
2017-05-05 12:01:50 -04:00
Eric Dumazet
84b114b984 tcp: randomize timestamps on syncookies
Whole point of randomization was to hide server uptime, but an attacker
can simply start a syn flood and TCP generates 'old style' timestamps,
directly revealing server jiffies value.

Also, TSval sent by the server to a particular remote address vary
depending on syncookies being sent or not, potentially triggering PAWS
drops for innocent clients.

Lets implement proper randomization, including for SYNcookies.

Also we do not need to export sysctl_tcp_timestamps, since it is not
used from a module.

In v2, I added Florian feedback and contribution, adding tsoff to
tcp_get_cookie_sock().

v3 removed one unused variable in tcp_v4_connect() as Florian spotted.

Fixes: 95a22caee3 ("tcp: randomize tcp timestamp offsets for each connection")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Tested-by: Florian Westphal <fw@strlen.de>
Cc: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-05 12:00:11 -04:00
Arnd Bergmann
34bf129a7f fbdev: sti: don't select CONFIG_VT
While working on another build error, I ran into several variations of
this dependency loop:

subsection "Kconfig recursive dependency limitations"
drivers/input/Kconfig:8:	symbol INPUT is selected by VT
For a resolution refer to Documentation/kbuild/kconfig-language.txt
subsection "Kconfig recursive dependency limitations"
drivers/tty/Kconfig:12:	symbol VT is selected by FB_STI
For a resolution refer to Documentation/kbuild/kconfig-language.txt
subsection "Kconfig recursive dependency limitations"
drivers/video/fbdev/Kconfig:677:	symbol FB_STI depends on FB
For a resolution refer to Documentation/kbuild/kconfig-language.txt
subsection "Kconfig recursive dependency limitations"
drivers/video/fbdev/Kconfig:5:	symbol FB is selected by DRM_KMS_FB_HELPER
For a resolution refer to Documentation/kbuild/kconfig-language.txt
subsection "Kconfig recursive dependency limitations"
drivers/gpu/drm/Kconfig:72:	symbol DRM_KMS_FB_HELPER is selected by DRM_KMS_CMA_HELPER
For a resolution refer to Documentation/kbuild/kconfig-language.txt
subsection "Kconfig recursive dependency limitations"
drivers/gpu/drm/Kconfig:137:	symbol DRM_KMS_CMA_HELPER is selected by DRM_HDLCD
For a resolution refer to Documentation/kbuild/kconfig-language.txt
subsection "Kconfig recursive dependency limitations"
drivers/gpu/drm/arm/Kconfig:6:	symbol DRM_HDLCD depends on OF
For a resolution refer to Documentation/kbuild/kconfig-language.txt
subsection "Kconfig recursive dependency limitations"
drivers/of/Kconfig:4:	symbol OF is selected by X86_INTEL_CE
For a resolution refer to Documentation/kbuild/kconfig-language.txt
subsection "Kconfig recursive dependency limitations"
arch/x86/Kconfig:523:	symbol X86_INTEL_CE depends on X86_IO_APIC
For a resolution refer to Documentation/kbuild/kconfig-language.txt
subsection "Kconfig recursive dependency limitations"
arch/x86/Kconfig:1011:	symbol X86_IO_APIC depends on X86_LOCAL_APIC
For a resolution refer to Documentation/kbuild/kconfig-language.txt
subsection "Kconfig recursive dependency limitations"
arch/x86/Kconfig:1005:	symbol X86_LOCAL_APIC depends on X86_UP_APIC
For a resolution refer to Documentation/kbuild/kconfig-language.txt
subsection "Kconfig recursive dependency limitations"
arch/x86/Kconfig:980:	symbol X86_UP_APIC depends on PCI_MSI
For a resolution refer to Documentation/kbuild/kconfig-language.txt
subsection "Kconfig recursive dependency limitations"
drivers/pci/Kconfig:11:	symbol PCI_MSI is selected by AMD_IOMMU
For a resolution refer to Documentation/kbuild/kconfig-language.txt
subsection "Kconfig recursive dependency limitations"
drivers/iommu/Kconfig:106:	symbol AMD_IOMMU depends on IOMMU_SUPPORT
For a resolution refer to Documentation/kbuild/kconfig-language.txt
subsection "Kconfig recursive dependency limitations"
drivers/iommu/Kconfig:5:	symbol IOMMU_SUPPORT is selected by DRM_ETNAVIV
For a resolution refer to Documentation/kbuild/kconfig-language.txt
subsection "Kconfig recursive dependency limitations"
drivers/gpu/drm/etnaviv/Kconfig:2:	symbol DRM_ETNAVIV depends on THERMAL
For a resolution refer to Documentation/kbuild/kconfig-language.txt
subsection "Kconfig recursive dependency limitations"
drivers/thermal/Kconfig:5:	symbol THERMAL is selected by ACPI_VIDEO
For a resolution refer to Documentation/kbuild/kconfig-language.txt
subsection "Kconfig recursive dependency limitations"
drivers/acpi/Kconfig:183:	symbol ACPI_VIDEO is selected by INPUT

This doesn't currently show up as I fixed the 'THERMAL' part of it,
but I noticed that the FB_STI dependency should not be there but
was introduced by slightly incorrect bug-fix patch that tried to
fix a link error.

Instead of selecting 'VT' to make us enter the drivers/video/console
directory at compile-time, it's sufficient to build the
drivers/video/console/sticore.c file by adding its directory
to when CONFIG_FB_STI is enabled. Alternatively, we could move the
sticore code to another directory that is always built when we
have at STI_CONSOLE or FB_STI enabled.

Fixes: 17085a9345 ("parisc: stifb: should depend on STI_CONSOLE")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Helge Deller <deller@gmx.de>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
2017-05-05 17:25:12 +02:00
Tobias Klauser
9051247dcf bridge: netlink: account for IFLA_BRPORT_{B, M}CAST_FLOOD size and policy
The attribute sizes for IFLA_BRPORT_MCAST_FLOOD and
IFLA_BRPORT_BCAST_FLOOD weren't accounted for in br_port_info_size()
when they were added. Do so now and also add the corresponding policy
entries:

Cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Cc: Mike Manning <mmanning@brocade.com>
Fixes: b6cb5ac833 ("net: bridge: add per-port multicast flood flag")
Fixes: 99f906e9ad ("bridge: add per-port broadcast flood flag")
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-05 11:21:03 -04:00
Björn Jacke
85435d7a15 CIFS: add misssing SFM mapping for doublequote
SFM is mapping doublequote to 0xF020

Without this patch creating files with doublequote fails to Windows/Mac

Signed-off-by: Bjoern Jacke <bjacke@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
CC: stable <stable@vger.kernel.org>
2017-05-05 08:33:44 -05:00
Zhang Rui
bb4d5e38de Merge branch 'backup-thermal-shutdown' into next 2017-05-05 20:30:09 +08:00
Zhang Rui
a6128f47f7 Merge branches 'thermal-core' and 'thermal-intel' into next 2017-05-05 20:30:03 +08:00
Catalin Marinas
92f66f84d9 arm64: Fix the DMA mmap and get_sgtable API with DMA_ATTR_FORCE_CONTIGUOUS
While honouring the DMA_ATTR_FORCE_CONTIGUOUS on arm64 (commit
44176bb38f: "arm64: Add support for DMA_ATTR_FORCE_CONTIGUOUS to
IOMMU"), the existing uses of dma_mmap_attrs() and dma_get_sgtable()
have been broken by passing a physically contiguous vm_struct with an
invalid pages pointer through the common iommu API.

Since the coherent allocation with DMA_ATTR_FORCE_CONTIGUOUS uses CMA,
this patch simply reuses the existing swiotlb logic for mmap and
get_sgtable.

Note that the current implementation of get_sgtable (both swiotlb and
iommu) is broken if dma_declare_coherent_memory() is used since such
memory does not have a corresponding struct page. To be addressed in a
subsequent patch.

Fixes: 44176bb38f ("arm64: Add support for DMA_ATTR_FORCE_CONTIGUOUS to IOMMU")
Reported-by: Andrzej Hajda <a.hajda@samsung.com>
Cc: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Robin Murphy <robin.murphy@arm.com>
Tested-by: Andrzej Hajda <a.hajda@samsung.com>
Reviewed-by: Andrzej Hajda <a.hajda@samsung.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-05-05 11:41:35 +01:00
Fabian Frederick
dcfd9b215b befs: make export work with cold dcache
based on commit b3b42c0dea
("fs/affs: make export work with cold dcache")

This adds get_parent function so that nfs client can still work after
cache drop (Tested on NFS v4 with echo 3 > /proc/sys/vm/drop_caches)

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Luis de Bethencourt <luisbg@osg.samsung.com>
2017-05-05 11:35:35 +01:00
Amir Goldstein
65f2673832 ovl: update documentation w.r.t. constant inode numbers
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-05-05 11:38:59 +02:00
Amir Goldstein
5b6c9053fb ovl: persistent inode numbers for upper hardlinks
An upper type non directory dentry that is a copy up target
should have a reference to its lower copy up origin.

There are three ways for an upper type dentry to be instantiated:
1. A lower type dentry that is being copied up
2. An entry that is found in upper dir by ovl_lookup()
3. A negative dentry is hardlinked to an upper type dentry

In the first case, the lower reference is set before copy up.
In the second case, the lower reference is found by ovl_lookup().
In the last case of hardlinked upper dentry, it is not easy to
update the lower reference of the negative dentry.  Instead,
drop the newly hardlinked negative dentry from dcache and let
the next access call ovl_lookup() to find its lower reference.

This makes sure that the inode number reported by stat(2) after
the hardlink is created is the same inode number that will be
reported by stat(2) after mount cycle, which is the inode number
of the lower copy up origin of the hardlink source.

NOTE that this does not fix breaking of lower hardlinks on copy
up, but only fixes the case of lower nlink == 1, whose upper copy
up inode is hardlinked in upper dir.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-05-05 11:38:58 +02:00
Miklos Szeredi
5b712091a3 ovl: merge getattr for dir and nondir
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-05-05 11:38:58 +02:00
Amir Goldstein
72b608f085 ovl: constant st_ino/st_dev across copy up
When all layers are on the same underlying filesystem, let stat(2) return
st_dev/st_ino values of the copy up origin inode if it is known.

This results in constant st_ino/st_dev representation of files in an
overlay mount before and after copy up.

When the underlying filesystem support NFS exportfs, the result is also
persistent st_ino/st_dev representation before and after mount cycle.

Lower hardlinks are broken on copy up to different upper files, so we
cannot use the lower origin st_ino for those different files, even for the
same fs case.

When all overlay layers are on the same fs, use overlay st_dev for non-dirs
to get the correct result from du -x.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-05-05 11:38:58 +02:00
Amir Goldstein
b7a807dc20 ovl: persistent inode number for directories
stat(2) on overlay directories reports the overlay temp inode
number, which is constant across copy up, but is not persistent.

When all layers are on the same fs, report the copy up origin inode
number for directories.

This inode number is persistent, unique across the overlay mount and
constant across copy up.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-05-05 11:38:58 +02:00
Amir Goldstein
595485033d ovl: set the ORIGIN type flag
For directory entries, non zero oe->numlower implies OVL_TYPE_MERGE.
Define a new type flag OVL_TYPE_ORIGIN to indicate that an entry holds a
reference to its lower copy up origin.

For directory entries ORIGIN := MERGE && UPPER. For non-dir entries ORIGIN
means that a lower type dentry has been recently copied up or that we were
able to find the copy up origin from overlay.origin xattr.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-05-05 11:38:58 +02:00
Amir Goldstein
a9d019573e ovl: lookup non-dir copy-up-origin by file handle
If overlay.origin xattr is found on a non-dir upper inode try to get lower
dentry by calling exportfs_decode_fh().

On failure to lookup by file handle to lower layer, do not lookup the copy
up origin by name, because the lower found by name could be another file in
case the upper file was renamed.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-05-05 11:38:58 +02:00
Amir Goldstein
c22205d058 ovl: use an auxiliary var for overlay root entry
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-05-05 11:38:58 +02:00
Amir Goldstein
3a1e819b4e ovl: store file handle of lower inode on copy up
Sometimes it is interesting to know if an upper file is pure upper or a
copy up target, and if it is a copy up target, it may be interesting to
find the copy up origin.

This will be used to preserve lower inode numbers across copy up.

Store the lower inode file handle in upper inode extended attribute
overlay.origin on copy up to use it later for these cases.  Store the lower
filesystem uuid along side the file handle, so we can validate that we are
looking for the origin file in the original fs.

If lower fs does not support NFS export ops store a zero sized xattr so we
can always use the overlay.origin xattr to distinguish between a copy up
and a pure upper inode.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-05-05 11:38:58 +02:00
Amir Goldstein
7bcd74b98d ovl: check if all layers are on the same fs
Some features can only work when all layers are on the same fs.  Test this
condition during mount time, so features can check them later.

Add helper ovl_same_sb() to return the common super block in case all
layers are on the same fs.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-05-05 11:38:57 +02:00
Boris Ostrovsky
d162809f85 xen/x86: Do not call xen_init_time_ops() until shared_info is initialized
Routines that are set by xen_init_time_ops() use shared_info's
pvclock_vcpu_time_info area. This area is not properly available until
shared_info is mapped in xen_setup_shared_info().

This became especially problematic due to commit dd759d93f4 ("x86/timers:
Add simple udelay calibration") where we end up reading tsc_to_system_mul
from xen_dummy_shared_info (i.e. getting zero value) and then trying
to divide by it in pvclock_tsc_khz().

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
2017-05-05 10:43:15 +02:00
Juergen Gross
40f4ac0b51 x86/xen: fix xsave capability setting
Commit 690b7f10b4f9f ("x86/xen: use capabilities instead of fake cpuid
values for xsave") introduced a regression as it tried to make use of
the fixup feature before it being available.

Fall back to the old variant testing via cpuid().

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
2017-05-05 10:43:10 +02:00
Jim Mattson
2e5b0bd9cc kvm: nVMX: Don't validate disabled secondary controls
According to the SDM, if the "activate secondary controls" primary
processor-based VM-execution control is 0, no checks are performed on
the secondary processor-based VM-execution controls.

Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-05-05 10:08:31 +02:00
Keerthy
ef1d87e06a thermal: core: Add a back up thermal shutdown mechanism
orderly_poweroff is triggered when a graceful shutdown
of system is desired. This may be used in many critical states of the
kernel such as when subsystems detects conditions such as critical
temperature conditions. However, in certain conditions in system
boot up sequences like those in the middle of driver probes being
initiated, userspace will be unable to power off the system in a clean
manner and leaves the system in a critical state. In cases like these,
the /sbin/poweroff will return success (having forked off to attempt
powering off the system. However, the system overall will fail to
completely poweroff (since other modules will be probed) and the system
is still functional with no userspace (since that would have shut itself
off).

However, there is no clean way of detecting such failure of userspace
powering off the system. In such scenarios, it is necessary for a backup
workqueue to be able to force a shutdown of the system when orderly
shutdown is not successful after a configurable time period.

Reported-by: Nishanth Menon <nm@ti.com>
Signed-off-by: Keerthy <j-keerthy@ti.com>
Acked-by: Eduardo Valentin <edubezval@gmail.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
2017-05-05 16:01:45 +08:00
Keerthy
e441fd6866 thermal: core: Allow orderly_poweroff to be called only once
thermal_zone_device_check --> thermal_zone_device_update -->
handle_thermal_trip --> handle_critical_trips --> orderly_poweroff

The above sequence happens every 250/500 mS based on the configuration.
The orderly_poweroff function is getting called every 250/500 mS.
With a full fledged file system it takes at least 5-10 Seconds to
power off gracefully.

In that period due to the thermal_zone_device_check triggering
periodically the thermal work queues bombard with
orderly_poweroff calls multiple times eventually leading to
failures in gracefully powering off the system.

Make sure that orderly_poweroff is called only once.

Signed-off-by: Keerthy <j-keerthy@ti.com>
Acked-by: Eduardo Valentin <edubezval@gmail.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
2017-05-05 16:01:44 +08:00
Brian Bian
68b2440b2a Thermal: Intel SoC DTS: Change interrupt request behavior
The interrupt request call in Intel SoC DTS driver may fail if
there is no underlying BIOS support. However, the user space
thermal daemon can still use the thermal zones created by the
SoC DTS driver in polling mode, therefore, instead of bailing
out on interrupt request failures, it is better just to log
a warning message and continue the init process.

Signed-off-by: Brian Bian <brian.bian@intel.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
2017-05-05 16:00:10 +08:00
Lukasz Luba
771ffa14ea trace: thermal: add another parameter 'power' to the tracing function
This patch adds another parameter to the trace function:
trace_thermal_power_devfreq_get_power().

In case when we call directly driver's code for the real power,
we do not have static/dynamic_power values. Instead we get total
power in the '*power' value. The 'static_power' and
'dynamic_power' are set to 0.

Therefore, we have to trace that '*power' value in this scenario.

CC: Steven Rostedt <rostedt@goodmis.org>
CC: Ingo Molnar <mingo@redhat.com>
CC: Zhang Rui <rui.zhang@intel.com>
CC: Eduardo Valentin <edubezval@gmail.com>
Acked-by: Javi Merino <javi.merino@kernel.org>
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
2017-05-05 15:54:45 +08:00
Lukasz Luba
2be83da85a thermal: devfreq_cooling: add new interface for direct power read
This patch introduces a new interface for device drivers connected to
devfreq_cooling in the thermal framework: get_real_power().

Some devices have more sophisticated methods (like power counters)
to approximate the actual power that they use.
In the previous implementation we had a pre-calculated power
table which was then scaled by 'utilization'
('busy_time' and 'total_time' taken from devfreq 'last_status').

With this new interface the driver can provide more precise data
regarding actual power to the thermal governor every time the power
budget is calculated. We then use this value and calculate the real
resource utilization scaling factor.

Reviewed-by: Chris Diamand <chris.diamand@arm.com>
Acked-by: Javi Merino <javi.merino@kernel.org>
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
2017-05-05 15:54:45 +08:00
Lukasz Luba
e34cab4cd1 thermal: devfreq_cooling: refactor code and add get_voltage function
Move the code which gets the voltage for a given frequency.
This code will be resused in few places.

Acked-by: Javi Merino <javi.merino@kernel.org>
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
2017-05-05 15:54:45 +08:00
Stafford Horne
17a9be3174 initramfs: Always do fput() and load modules after rootfs populate
In OpenRISC we do not have a bootloader passed initrd, but the built in
initramfs does contain the /init and other binaries, including modules.
The previous commit 0886551480 ("initramfs: finish fput() before
accessing any binary from initramfs") made a change to only call fput()
if the bootloader initrd was available, this caused intermittent crashes
for OpenRISC.

This patch changes the fput() to happen unconditionally if any rootfs is
loaded. Also, I added some comments to make it a bit more clear why we
call unpack_to_rootfs() multiple times.

Fixes: 0886551480 ("initramfs: finish fput() before accessing any binary from initramfs")
Cc: stable@vger.kernel.org
Cc: Lokesh Vutla <lokeshvutla@ti.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Stafford Horne <shorne@gmail.com>
2017-05-05 16:01:08 +09:00
Dan Williams
736163671b Merge branch 'for-4.12/dax' into libnvdimm-for-next 2017-05-04 23:38:43 -07:00
Matthias Kaehlcke
121843eb02 x86/mm/kaslr: Use the _ASM_MUL macro for multiplication to work around Clang incompatibility
The constraint "rm" allows the compiler to put mix_const into memory.
When the input operand is a memory location then MUL needs an operand
size suffix, since Clang can't infer the multiplication width from the
operand.

Add and use the _ASM_MUL macro which determines the operand size and
resolves to the NUL instruction with the corresponding suffix.

This fixes the following error when building with clang:

  CC      arch/x86/lib/kaslr.o
  /tmp/kaslr-dfe1ad.s: Assembler messages:
  /tmp/kaslr-dfe1ad.s:182: Error: no instruction mnemonic suffix given and no register operands; can't size instruction

Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Cc: Grant Grundler <grundler@chromium.org>
Cc: Greg Hackmann <ghackmann@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Davidson <md@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170501224741.133938-1-mka@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-05-05 08:31:05 +02:00
Scott Wood
61baf15555 powerpc/64e: Don't place the stack beyond TASK_SIZE
Commit f4ea6dcb08 ("powerpc/mm: Enable mappings above 128TB") increased
the task size on book3s, and introduced a mechanism to dynamically
control whether a task uses these larger addresses.  While the change to
the task size itself was ifdef-protected to only apply on book3s, the
change to STACK_TOP_USER64 was not.  On book3e, this had the effect of
trying to use addresses up to 128TiB for the stack despite a 64TiB task
size limit -- which broke 64-bit userspace producing the following errors:

Starting init: /sbin/init exists but couldn't execute it (error -14)
Starting init: /bin/sh exists but couldn't execute it (error -14)
Kernel panic - not syncing: No working init found.  Try passing init= option to kernel. See Linux Documentation/admin-guide/init.rst for guidance.

Fixes: f4ea6dcb08 ("powerpc/mm: Enable mappings above 128TB")
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Scott Wood <oss@buserror.net>
2017-05-05 01:22:06 -05:00
Baoquan He
fc5f9d5f15 x86/mm: Fix boot crash caused by incorrect loop count calculation in sync_global_pgds()
Jeff Moyer reported that on his system with two memory regions 0~64G and
1T~1T+192G, and kernel option "memmap=192G!1024G" added, enabling KASLR
will make the system hang intermittently during boot. While adding 'nokaslr'
won't.

The back trace is:

 Oops: 0000 [#1] SMP

 RIP: memcpy_erms()
 [ .... ]
 Call Trace:
  pmem_rw_page()
  bdev_read_page()
  do_mpage_readpage()
  mpage_readpages()
  blkdev_readpages()
  __do_page_cache_readahead()
  force_page_cache_readahead()
  page_cache_sync_readahead()
  generic_file_read_iter()
  blkdev_read_iter()
  __vfs_read()
  vfs_read()
  SyS_read()
  entry_SYSCALL_64_fastpath()

This crash happens because the for loop count calculation in sync_global_pgds()
is not correct. When a mapping area crosses PGD entries, we should
calculate the starting address of region which next PGD covers and assign
it to next for loop count, but not add PGDIR_SIZE directly. The old
code works right only if the mapping area is an exact multiple of PGDIR_SIZE,
otherwize the end region could be skipped so that it can't be synchronized
to all other processes from kernel PGD init_mm.pgd.

In Jeff's system, emulated pmem area [1024G, 1216G) is smaller than
PGDIR_SIZE. While 'nokaslr' works because PAGE_OFFSET is 1T aligned, it
makes this area be mapped inside one PGD entry. With KASLR enabled,
this area could cross two PGD entries, then the next PGD entry won't
be synced to all other processes. That is why we saw empty PGD.

Fix it.

Reported-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Baoquan He <bhe@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jinbum Park <jinb.park7@gmail.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yasuaki Ishimatsu <yasu.isimatu@gmail.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1493864747-8506-1-git-send-email-bhe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-05-05 08:21:24 +02:00
Ingo Molnar
415812f2d6 Merge branch 'linus' into x86/urgent, to pick up dependent commits
We are going to fix a bug introduced by a more recent commit, so
refresh the tree.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-05-05 08:21:03 +02:00
Daniel Micay
5ea30e4e58 stackprotector: Increase the per-task stack canary's random range from 32 bits to 64 bits on 64-bit platforms
The stack canary is an 'unsigned long' and should be fully initialized to
random data rather than only 32 bits of random data.

Signed-off-by: Daniel Micay <danielmicay@gmail.com>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Arjan van Ven <arjan@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel-hardening@lists.openwall.com
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20170504133209.3053-1-danielmicay@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-05-05 08:05:13 +02:00