1
0
Commit Graph

47685 Commits

Author SHA1 Message Date
Linus Torvalds
c9341ee0af Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull security layer updates from James Morris:
 "Highlights:

   - major AppArmor update: policy namespaces & lots of fixes

   - add /sys/kernel/security/lsm node for easy detection of loaded LSMs

   - SELinux cgroupfs labeling support

   - SELinux context mounts on tmpfs, ramfs, devpts within user
     namespaces

   - improved TPM 2.0 support"

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (117 commits)
  tpm: declare tpm2_get_pcr_allocation() as static
  tpm: Fix expected number of response bytes of TPM1.2 PCR Extend
  tpm xen: drop unneeded chip variable
  tpm: fix misspelled "facilitate" in module parameter description
  tpm_tis: fix the error handling of init_tis()
  KEYS: Use memzero_explicit() for secret data
  KEYS: Fix an error code in request_master_key()
  sign-file: fix build error in sign-file.c with libressl
  selinux: allow changing labels for cgroupfs
  selinux: fix off-by-one in setprocattr
  tpm: silence an array overflow warning
  tpm: fix the type of owned field in cap_t
  tpm: add securityfs support for TPM 2.0 firmware event log
  tpm: enhance read_log_of() to support Physical TPM event log
  tpm: enhance TPM 2.0 PCR extend to support multiple banks
  tpm: implement TPM 2.0 capability to get active PCR banks
  tpm: fix RC value check in tpm2_seal_trusted
  tpm_tis: fix iTPM probe via probe_itpm() function
  tpm: Begin the process to deprecate user_read_timer
  tpm: remove tpm_read_index and tpm_write_index from tpm.h
  ...
2017-02-21 12:49:56 -08:00
Linus Torvalds
772c8f6f3b Merge tag 'for-4.11/linus-merge-signed' of git://git.kernel.dk/linux-block
Pull block layer updates from Jens Axboe:

 - blk-mq scheduling framework from me and Omar, with a port of the
   deadline scheduler for this framework. A port of BFQ from Paolo is in
   the works, and should be ready for 4.12.

 - Various fixups and improvements to the above scheduling framework
   from Omar, Paolo, Bart, me, others.

 - Cleanup of the exported sysfs blk-mq data into debugfs, from Omar.
   This allows us to export more information that helps debug hangs or
   performance issues, without cluttering or abusing the sysfs API.

 - Fixes for the sbitmap code, the scalable bitmap code that was
   migrated from blk-mq, from Omar.

 - Removal of the BLOCK_PC support in struct request, and refactoring of
   carrying SCSI payloads in the block layer. This cleans up the code
   nicely, and enables us to kill the SCSI specific parts of struct
   request, shrinking it down nicely. From Christoph mainly, with help
   from Hannes.

 - Support for ranged discard requests and discard merging, also from
   Christoph.

 - Support for OPAL in the block layer, and for NVMe as well. Mainly
   from Scott Bauer, with fixes/updates from various others folks.

 - Error code fixup for gdrom from Christophe.

 - cciss pci irq allocation cleanup from Christoph.

 - Making the cdrom device operations read only, from Kees Cook.

 - Fixes for duplicate bdi registrations and bdi/queue life time
   problems from Jan and Dan.

 - Set of fixes and updates for lightnvm, from Matias and Javier.

 - A few fixes for nbd from Josef, using idr to name devices and a
   workqueue deadlock fix on receive. Also marks Josef as the current
   maintainer of nbd.

 - Fix from Josef, overwriting queue settings when the number of
   hardware queues is updated for a blk-mq device.

 - NVMe fix from Keith, ensuring that we don't repeatedly mark and IO
   aborted, if we didn't end up aborting it.

 - SG gap merging fix from Ming Lei for block.

 - Loop fix also from Ming, fixing a race and crash between setting loop
   status and IO.

 - Two block race fixes from Tahsin, fixing request list iteration and
   fixing a race between device registration and udev device add
   notifiations.

 - Double free fix from cgroup writeback, from Tejun.

 - Another double free fix in blkcg, from Hou Tao.

 - Partition overflow fix for EFI from Alden Tondettar.

* tag 'for-4.11/linus-merge-signed' of git://git.kernel.dk/linux-block: (156 commits)
  nvme: Check for Security send/recv support before issuing commands.
  block/sed-opal: allocate struct opal_dev dynamically
  block/sed-opal: tone down not supported warnings
  block: don't defer flushes on blk-mq + scheduling
  blk-mq-sched: ask scheduler for work, if we failed dispatching leftovers
  blk-mq: don't special case flush inserts for blk-mq-sched
  blk-mq-sched: don't add flushes to the head of requeue queue
  blk-mq: have blk_mq_dispatch_rq_list() return if we queued IO or not
  block: do not allow updates through sysfs until registration completes
  lightnvm: set default lun range when no luns are specified
  lightnvm: fix off-by-one error on target initialization
  Maintainers: Modify SED list from nvme to block
  Move stack parameters for sed_ioctl to prevent oversized stack with CONFIG_KASAN
  uapi: sed-opal fix IOW for activate lsp to use correct struct
  cdrom: Make device operations read-only
  elevator: fix loading wrong elevator type for blk-mq devices
  cciss: switch to pci_irq_alloc_vectors
  block/loop: fix race between I/O and set_status
  blk-mq-sched: don't hold queue_lock when calling exit_icq
  block: set make_request_fn manually in blk_mq_update_nr_hw_queues
  ...
2017-02-21 10:57:33 -08:00
Linus Torvalds
9763dd6f81 Merge tag 'gfs2-4.11.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
Pull GFS2 updates from Robert Peterson:
 "We've got eight GFS2 patches for this merge window:

   - Andy Price submitted a patch to make gfs2_write_full_page a static
     function.

   - Dan Carpenter submitted a patch to fix a ERR_PTR thinko.

  Three patches fix bugs related to deleting very large files, which
  cause GFS2 to run out of journal space:

   - The first one prevents GFS2 delete operation from requesting too
     much journal space.

   - The second one fixes a problem whereby GFS2 can hang because it
     wasn't taking journal space demand into its calculations.

   - The third one wakes up IO waiters when a flush is done to restart
     processes stuck waiting for journal space to become available.

  The final three patches are a performance improvement related to
  spin_lock contention between multiple writers:

   - The "tr_touched" variable was switched to a flag to be more atomic
     and eliminate the possibility of some races.

   - Function meta_lo_add was moved inline with its only caller to make
     the code more readable and efficient.

   - Contention on the gfs2_log_lock spinlock was greatly reduced by
     avoiding the lock altogether in cases where we don't really need
     it: buffers that already appear in the appropriate metadata list
     for the journal. Many thanks to Steve Whitehouse for the ideas and
     principles behind these patches"

* tag 'gfs2-4.11.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
  gfs2: Make gfs2_write_full_page static
  GFS2: Reduce contention on gfs2_log_lock
  GFS2: Inline function meta_lo_add
  GFS2: Switch tr_touched to flag in transaction
  GFS2: Wake up io waiters whenever a flush is done
  GFS2: Made logd daemon take into account log demand
  GFS2: Limit number of transaction blocks requested for truncates
  GFS2: Fix reference to ERR_PTR in gfs2_glock_iter_next
2017-02-21 07:46:34 -08:00
Linus Torvalds
70fcf5c339 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull UDF fixes and cleanups from Jan Kara:
 "Several small UDF fixes and cleanups and a small cleanup of fanotify
  code"

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  fanotify: simplify the code of fanotify_merge
  udf: simplify udf_ioctl()
  udf: fix ioctl errors
  udf: allow implicit blocksize specification during mount
  udf: check partition reference in udf_read_inode()
  udf: atomically read inode size
  udf: merge module informations in super.c
  udf: remove next_epos from udf_update_extent_cache()
  udf: Factor out trimming of crtime
  udf: remove empty condition
  udf: remove unneeded line break
  udf: merge bh free
  udf: use pointer for kernel_long_ad argument
  udf: use __packed instead of __attribute__ ((packed))
  udf: Make stat on symlink report symlink length as st_size
  fs/udf: make #ifdef UDF_PREALLOCATE unconditional
  fs: udf: Replace CURRENT_TIME with current_time()
2017-02-21 07:44:03 -08:00
Linus Torvalds
2bfe01eff4 Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6
Pull CIFS/SMB3 updates from Steve French:
 "Includes support for a critical SMB3 security feature: per-share
  encryption from Pavel, and a cleanup from Jean Delvare.

  Will have another cifs/smb3 merge next week"

* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
  CIFS: Allow to switch on encryption with seal mount option
  CIFS: Add capability to decrypt big read responses
  CIFS: Decrypt and process small encrypted packets
  CIFS: Add copy into pages callback for a read operation
  CIFS: Add mid handle callback
  CIFS: Add transform header handling callbacks
  CIFS: Encrypt SMB3 requests before sending
  CIFS: Enable encryption during session setup phase
  CIFS: Add capability to transform requests before sending
  CIFS: Separate RFC1001 length processing for SMB2 read
  CIFS: Separate SMB2 sync header processing
  CIFS: Send RFC1001 length in a separate iov
  CIFS: Make send_cancel take rqst as argument
  CIFS: Make SendReceive2() takes resp iov
  CIFS: Separate SMB2 header structure
  CIFS: Fix splice read for non-cached files
  cifs: Add soft dependencies
  cifs: Only select the required crypto modules
  cifs: Simplify SMB2 and SMB311 dependencies
2017-02-20 18:38:47 -08:00
Linus Torvalds
cab7076a18 Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
 "For this cycle we add support for the shutdown ioctl, which is
  primarily used for testing, but which can be useful on production
  systems when a scratch volume is being destroyed and the data on it
  doesn't need to be saved.

  This found (and we fixed) a number of bugs with ext4's recovery to
  corrupted file system --- the bugs increased the amount of data that
  could be potentially lost, and in the case of the inline data feature,
  could cause the kernel to BUG.

  Also included are a number of other bug fixes, including in ext4's
  fscrypt, DAX, inline data support"

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (26 commits)
  ext4: rename EXT4_IOC_GOINGDOWN to EXT4_IOC_SHUTDOWN
  ext4: fix fencepost in s_first_meta_bg validation
  ext4: don't BUG when truncating encrypted inodes on the orphan list
  ext4: do not use stripe_width if it is not set
  ext4: fix stripe-unaligned allocations
  dax: assert that i_rwsem is held exclusive for writes
  ext4: fix DAX write locking
  ext4: add EXT4_IOC_GOINGDOWN ioctl
  ext4: add shutdown bit and check for it
  ext4: rename s_resize_flags to s_ext4_flags
  ext4: return EROFS if device is r/o and journal replay is needed
  ext4: preserve the needs_recovery flag when the journal is aborted
  jbd2: don't leak modified metadata buffers on an aborted journal
  ext4: fix inline data error paths
  ext4: move halfmd4 into hash.c directly
  ext4: fix use-after-iput when fscrypt contexts are inconsistent
  jbd2: fix use after free in kjournald2()
  ext4: fix data corruption in data=journal mode
  ext4: trim allocation requests to group size
  ext4: replace BUG_ON with WARN_ON in mb_find_extent()
  ...
2017-02-20 18:24:39 -08:00
Linus Torvalds
6c24337f22 Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/fscrypt
Pull fscrypt updates from Ted Ts'o:
 "Various cleanups for the file system encryption feature"

* tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/fscrypt:
  fscrypt: constify struct fscrypt_operations
  fscrypt: properly declare on-stack completion
  fscrypt: split supp and notsupp declarations into their own headers
  fscrypt: remove redundant assignment of res
  fscrypt: make fscrypt_operations.key_prefix a string
  fscrypt: remove unused 'mode' member of fscrypt_ctx
  ext4: don't allow encrypted operations without keys
  fscrypt: make test_dummy_encryption require a keyring key
  fscrypt: factor out bio specific functions
  fscrypt: pass up error codes from ->get_context()
  fscrypt: remove user-triggerable warning messages
  fscrypt: use EEXIST when file already uses different policy
  fscrypt: use ENOTDIR when setting encryption policy on nondirectory
  fscrypt: use ENOKEY when file cannot be created w/o key
2017-02-20 18:22:31 -08:00
Linus Torvalds
42e1b14b6e Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
 "The main changes in this cycle were:

   - Implement wraparound-safe refcount_t and kref_t types based on
     generic atomic primitives (Peter Zijlstra)

   - Improve and fix the ww_mutex code (Nicolai Hähnle)

   - Add self-tests to the ww_mutex code (Chris Wilson)

   - Optimize percpu-rwsems with the 'rcuwait' mechanism (Davidlohr
     Bueso)

   - Micro-optimize the current-task logic all around the core kernel
     (Davidlohr Bueso)

   - Tidy up after recent optimizations: remove stale code and APIs,
     clean up the code (Waiman Long)

   - ... plus misc fixes, updates and cleanups"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (50 commits)
  fork: Fix task_struct alignment
  locking/spinlock/debug: Remove spinlock lockup detection code
  lockdep: Fix incorrect condition to print bug msgs for MAX_LOCKDEP_CHAIN_HLOCKS
  lkdtm: Convert to refcount_t testing
  kref: Implement 'struct kref' using refcount_t
  refcount_t: Introduce a special purpose refcount type
  sched/wake_q: Clarify queue reinit comment
  sched/wait, rcuwait: Fix typo in comment
  locking/mutex: Fix lockdep_assert_held() fail
  locking/rtmutex: Flip unlikely() branch to likely() in __rt_mutex_slowlock()
  locking/rwsem: Reinit wake_q after use
  locking/rwsem: Remove unnecessary atomic_long_t casts
  jump_labels: Move header guard #endif down where it belongs
  locking/atomic, kref: Implement kref_put_lock()
  locking/ww_mutex: Turn off __must_check for now
  locking/atomic, kref: Avoid more abuse
  locking/atomic, kref: Use kref_get_unless_zero() more
  locking/atomic, kref: Kill kref_sub()
  locking/atomic, kref: Add kref_read()
  locking/atomic, kref: Add KREF_INIT()
  ...
2017-02-20 13:23:30 -08:00
Linus Torvalds
828cad8ea0 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
 "The main changes in this (fairly busy) cycle were:

   - There was a class of scheduler bugs related to forgetting to update
     the rq-clock timestamp which can cause weird and hard to debug
     problems, so there's a new debug facility for this: which uncovered
     a whole lot of bugs which convinced us that we want to keep the
     debug facility.

     (Peter Zijlstra, Matt Fleming)

   - Various cputime related updates: eliminate cputime and use u64
     nanoseconds directly, simplify and improve the arch interfaces,
     implement delayed accounting more widely, etc. - (Frederic
     Weisbecker)

   - Move code around for better structure plus cleanups (Ingo Molnar)

   - Move IO schedule accounting deeper into the scheduler plus related
     changes to improve the situation (Tejun Heo)

   - ... plus a round of sched/rt and sched/deadline fixes, plus other
     fixes, updats and cleanups"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (85 commits)
  sched/core: Remove unlikely() annotation from sched_move_task()
  sched/autogroup: Rename auto_group.[ch] to autogroup.[ch]
  sched/topology: Split out scheduler topology code from core.c into topology.c
  sched/core: Remove unnecessary #include headers
  sched/rq_clock: Consolidate the ordering of the rq_clock methods
  delayacct: Include <uapi/linux/taskstats.h>
  sched/core: Clean up comments
  sched/rt: Show the 'sched_rr_timeslice' SCHED_RR timeslice tuning knob in milliseconds
  sched/clock: Add dummy clear_sched_clock_stable() stub function
  sched/cputime: Remove generic asm headers
  sched/cputime: Remove unused nsec_to_cputime()
  s390, sched/cputime: Remove unused cputime definitions
  powerpc, sched/cputime: Remove unused cputime definitions
  s390, sched/cputime: Make arch_cpu_idle_time() to return nsecs
  ia64, sched/cputime: Remove unused cputime definitions
  ia64: Convert vtime to use nsec units directly
  ia64, sched/cputime: Move the nsecs based cputime headers to the last arch using it
  sched/cputime: Remove jiffies based cputime
  sched/cputime, vtime: Return nsecs instead of cputime_t to account
  sched/cputime: Complete nsec conversion of tick based accounting
  ...
2017-02-20 12:52:55 -08:00
Theodore Ts'o
e9be2ac7c0 ext4: rename EXT4_IOC_GOINGDOWN to EXT4_IOC_SHUTDOWN
It's very likely the file system independent ioctl name will be
FS_IOC_SHUTDOWN, so let's use the same name for the ext4 ioctl name.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-02-20 15:34:59 -05:00
Linus Torvalds
20dcfe1b7d Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner:
 "Nothing exciting, just the usual pile of fixes, updates and cleanups:

   - A bunch of clocksource driver updates

   - Removal of CONFIG_TIMER_STATS and the related /proc file

   - More posix timer slim down work

   - A scalability enhancement in the tick broadcast code

   - Math cleanups"

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
  hrtimer: Catch invalid clockids again
  math64, tile: Fix build failure
  clocksource/drivers/arm_arch_timer:: Mark cyclecounter __ro_after_init
  timerfd: Protect the might cancel mechanism proper
  timer_list: Remove useless cast when printing
  time: Remove CONFIG_TIMER_STATS
  clocksource/drivers/arm_arch_timer: Work around Hisilicon erratum 161010101
  clocksource/drivers/arm_arch_timer: Introduce generic errata handling infrastructure
  clocksource/drivers/arm_arch_timer: Remove fsl-a008585 parameter
  clocksource/drivers/arm_arch_timer: Add dt binding for hisilicon-161010101 erratum
  clocksource/drivers/ostm: Add renesas-ostm timer driver
  clocksource/drivers/ostm: Document renesas-ostm timer DT bindings
  clocksource/drivers/tcb_clksrc: Use 32 bit tcb as sched_clock
  clocksource/drivers/gemini: Add driver for the Cortina Gemini
  clocksource: add DT bindings for Cortina Gemini
  clockevents: Add a clkevt-of mechanism like clksrc-of
  tick/broadcast: Reduce lock cacheline contention
  timers: Omit POSIX timer stuff from task_struct when disabled
  x86/timer: Make delay() work during early bootup
  delay: Add explanation of udelay() inaccuracy
  ...
2017-02-20 10:06:32 -08:00
Jens Axboe
818551e2b2 Merge branch 'for-4.11/next' into for-4.11/linus-merge
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-02-17 14:08:19 -07:00
Miklos Szeredi
5a81e6a171 vfs: fix uninitialized flags in splice_to_pipe()
Flags (PIPE_BUF_FLAG_PACKET, PIPE_BUF_FLAG_GIFT) could remain on the
unused part of the pipe ring buffer.  Previously splice_to_pipe() left
the flags value alone, which could result in incorrect behavior.

Uninitialized flags appears to have been there from the introduction of
the splice syscall.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Cc: <stable@vger.kernel.org> # 2.6.17+
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-16 09:09:02 -08:00
Miklos Szeredi
84588a93d0 fuse: fix uninitialized flags in pipe_buffer
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: d82718e348 ("fuse_dev_splice_read(): switch to add_to_pipe()")
Cc: <stable@vger.kernel.org> # 4.9+
2017-02-16 15:08:20 +01:00
Sahitya Tummala
6ba4d2722d fuse: fix use after free issue in fuse_dev_do_read()
There is a potential race between fuse_dev_do_write()
and request_wait_answer() contexts as shown below:

TASK 1:
__fuse_request_send():
  |--spin_lock(&fiq->waitq.lock);
  |--queue_request();
  |--spin_unlock(&fiq->waitq.lock);
  |--request_wait_answer():
       |--if (test_bit(FR_SENT, &req->flags))
       <gets pre-empted after it is validated true>
                                   TASK 2:
                                   fuse_dev_do_write():
                                     |--clears bit FR_SENT,
                                     |--request_end():
                                        |--sets bit FR_FINISHED
                                        |--spin_lock(&fiq->waitq.lock);
                                        |--list_del_init(&req->intr_entry);
                                        |--spin_unlock(&fiq->waitq.lock);
                                        |--fuse_put_request();
       |--queue_interrupt();
       <request gets queued to interrupts list>
            |--wake_up_locked(&fiq->waitq);
       |--wait_event_freezable();
       <as FR_FINISHED is set, it returns and then
       the caller frees this request>

Now, the next fuse_dev_do_read(), see interrupts list is not empty
and then calls fuse_read_interrupt() which tries to access the request
which is already free'd and gets the below crash:

[11432.401266] Unable to handle kernel paging request at virtual address
6b6b6b6b6b6b6b6b
...
[11432.418518] Kernel BUG at ffffff80083720e0
[11432.456168] PC is at __list_del_entry+0x6c/0xc4
[11432.463573] LR is at fuse_dev_do_read+0x1ac/0x474
...
[11432.679999] [<ffffff80083720e0>] __list_del_entry+0x6c/0xc4
[11432.687794] [<ffffff80082c65e0>] fuse_dev_do_read+0x1ac/0x474
[11432.693180] [<ffffff80082c6b14>] fuse_dev_read+0x6c/0x78
[11432.699082] [<ffffff80081d5638>] __vfs_read+0xc0/0xe8
[11432.704459] [<ffffff80081d5efc>] vfs_read+0x90/0x108
[11432.709406] [<ffffff80081d67f0>] SyS_read+0x58/0x94

As FR_FINISHED bit is set before deleting the intr_entry with input
queue lock in request completion path, do the testing of this flag and
queueing atomically with the same lock in queue_interrupt().

Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: fd22d62ed0 ("fuse: no fc->lock for iqueue parts")
Cc: <stable@vger.kernel.org> # 4.2+
2017-02-15 10:28:24 +01:00
Theodore Ts'o
2ba3e6e8af ext4: fix fencepost in s_first_meta_bg validation
It is OK for s_first_meta_bg to be equal to the number of block group
descriptor blocks.  (It rarely happens, but it shouldn't cause any
problems.)

https://bugzilla.kernel.org/show_bug.cgi?id=194567

Fixes: 3a4b77cd47
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
2017-02-15 01:26:39 -05:00
Theodore Ts'o
0d06863f90 ext4: don't BUG when truncating encrypted inodes on the orphan list
Fix a BUG when the kernel tries to mount a file system constructed as
follows:

echo foo > foo.txt
mke2fs -Fq -t ext4 -O encrypt foo.img 100
debugfs -w foo.img << EOF
write foo.txt a
set_inode_field a i_flags 0x80800
set_super_value s_last_orphan 12
quit
EOF

root@kvm-xfstests:~# mount -o loop foo.img /mnt
[  160.238770] ------------[ cut here ]------------
[  160.240106] kernel BUG at /usr/projects/linux/ext4/fs/ext4/inode.c:3874!
[  160.240106] invalid opcode: 0000 [#1] SMP
[  160.240106] Modules linked in:
[  160.240106] CPU: 0 PID: 2547 Comm: mount Tainted: G        W       4.10.0-rc3-00034-gcdd33b941b67 #227
[  160.240106] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-1 04/01/2014
[  160.240106] task: f4518000 task.stack: f47b6000
[  160.240106] EIP: ext4_block_zero_page_range+0x1a7/0x2b4
[  160.240106] EFLAGS: 00010246 CPU: 0
[  160.240106] EAX: 00000001 EBX: f7be4b50 ECX: f47b7dc0 EDX: 00000007
[  160.240106] ESI: f43b05a8 EDI: f43babec EBP: f47b7dd0 ESP: f47b7dac
[  160.240106]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[  160.240106] CR0: 80050033 CR2: bfd85b08 CR3: 34a00680 CR4: 000006f0
[  160.240106] Call Trace:
[  160.240106]  ext4_truncate+0x1e9/0x3e5
[  160.240106]  ext4_fill_super+0x286f/0x2b1e
[  160.240106]  ? set_blocksize+0x2e/0x7e
[  160.240106]  mount_bdev+0x114/0x15f
[  160.240106]  ext4_mount+0x15/0x17
[  160.240106]  ? ext4_calculate_overhead+0x39d/0x39d
[  160.240106]  mount_fs+0x58/0x115
[  160.240106]  vfs_kern_mount+0x4b/0xae
[  160.240106]  do_mount+0x671/0x8c3
[  160.240106]  ? _copy_from_user+0x70/0x83
[  160.240106]  ? strndup_user+0x31/0x46
[  160.240106]  SyS_mount+0x57/0x7b
[  160.240106]  do_int80_syscall_32+0x4f/0x61
[  160.240106]  entry_INT80_32+0x2f/0x2f
[  160.240106] EIP: 0xb76b919e
[  160.240106] EFLAGS: 00000246 CPU: 0
[  160.240106] EAX: ffffffda EBX: 08053838 ECX: 08052188 EDX: 080537e8
[  160.240106] ESI: c0ed0000 EDI: 00000000 EBP: 080537e8 ESP: bfa13660
[  160.240106]  DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b
[  160.240106] Code: 59 8b 00 a8 01 0f 84 09 01 00 00 8b 07 66 25 00 f0 66 3d 00 80 75 61 89 f8 e8 3e e2 ff ff 84 c0 74 56 83 bf 48 02 00 00 00 75 02 <0f> 0b 81 7d e8 00 10 00 00 74 02 0f 0b 8b 43 04 8b 53 08 31 c9
[  160.240106] EIP: ext4_block_zero_page_range+0x1a7/0x2b4 SS:ESP: 0068:f47b7dac
[  160.317241] ---[ end trace d6a773a375c810a5 ]---

The problem is that when the kernel tries to truncate an inode in
ext4_truncate(), it tries to clear any on-disk data beyond i_size.
Without the encryption key, it can't do that, and so it triggers a
BUG.

E2fsck does *not* provide this service, and in practice most file
systems have their orphan list processed by e2fsck, so to avoid
crashing, this patch skips this step if we don't have access to the
encryption key (which is the case when processing the orphan list; in
all other cases, we will have the encryption key, or the kernel
wouldn't have allowed the file to be opened).

An open question is whether the fact that e2fsck isn't clearing the
bytes beyond i_size causing problems --- and if we've lived with it
not doing it for so long, can we drop this from the kernel replay of
the orphan list in all cases (not just when we don't have the key for
encrypted inodes).

Addresses-Google-Bug: #35209576

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-02-14 11:31:15 -05:00
Linus Torvalds
2b95550a43 Merge branch 'for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
 "This has two last minute fixes. The highest priority here is a
  regression fix for the decompression code, but we also fixed up a
  problem with the 32-bit compat ioctls.

  The decompression bug could hand back the wrong data on big reads when
  zlib was used. I have a larger cleanup to make the math here less
  error prone, but at this stage in the release Omar's patch is the best
  choice"

* 'for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  btrfs: fix btrfs_decompress_buf2page()
  btrfs: fix btrfs_compat_ioctl failures on non-compat ioctls
2017-02-11 09:15:58 -08:00
Omar Sandoval
6e78b3f7a1 Btrfs: fix btrfs_decompress_buf2page()
If btrfs_decompress_buf2page() is handed a bio with its page in the
middle of the working buffer, then we adjust the offset into the working
buffer. After we copy into the bio, we advance the iterator by the
number of bytes we copied. Then, we have some logic to handle the case
of discontiguous pages and adjust the offset into the working buffer
again. However, if we didn't advance the bio to a new page, we may enter
this case in error, essentially repeating the adjustment that we already
made when we entered the function. The end result is bogus data in the
bio.

Previously, we only checked for this case when we advanced to a new
page, but the conversion to bio iterators changed that. This restores
the old, correct behavior.

A case I saw when testing with zlib was:

    buf_start = 42769
    total_out = 46865
    working_bytes = total_out - buf_start = 4096
    start_byte = 45056

The condition (total_out > start_byte && buf_start < start_byte) is
true, so we adjust the offset:

    buf_offset = start_byte - buf_start = 2287
    working_bytes -= buf_offset = 1809
    current_buf_start = buf_start = 42769

Then, we copy

    bytes = min(bvec.bv_len, PAGE_SIZE - buf_offset, working_bytes) = 1809
    buf_offset += bytes = 4096
    working_bytes -= bytes = 0
    current_buf_start += bytes = 44578

After bio_advance(), we are still in the same page, so start_byte is the
same. Then, we check (total_out > start_byte && current_buf_start < start_byte),
which is true! So, we adjust the values again:

    buf_offset = start_byte - buf_start = 2287
    working_bytes = total_out - start_byte = 1809
    current_buf_start = buf_start + buf_offset = 45056

But note that working_bytes was already zero before this, so we should
have stopped copying.

Fixes: 974b1adc3b ("btrfs: use bio iterators for the decompression handlers")
Reported-by: Pat Erley <pat-lkml@erley.org>
Reviewed-by: Chris Mason <clm@fb.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Tested-by: Liu Bo <bo.li.liu@oracle.com>
2017-02-10 19:11:03 -08:00
Linus Torvalds
7fe654dca2 Merge tag 'nfsd-4.10-3' of git://linux-nfs.org/~bfields/linux
Pull nfsd revert from Bruce Fields:
 "This patch turned out to have a couple problems. The problems are
  fixable, but at least one of the fixes is a little ugly. The original
  bug has always been there, so we can wait another week or two to get
  this right"

* tag 'nfsd-4.10-3' of git://linux-nfs.org/~bfields/linux:
  nfsd: Revert "nfsd: special case truncates some more"
2017-02-10 14:23:45 -08:00
Thomas Gleixner
1e38da300e timerfd: Protect the might cancel mechanism proper
The handling of the might_cancel queueing is not properly protected, so
parallel operations on the file descriptor can race with each other and
lead to list corruptions or use after free.

Protect the context for these operations with a seperate lock.

The wait queue lock cannot be reused for this because that would create a
lock inversion scenario vs. the cancel lock. Replacing might_cancel with an
atomic (atomic_t or atomic bit) does not help either because it still can
race vs. the actual list operation.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "linux-fsdevel@vger.kernel.org"
Cc: syzkaller <syzkaller@googlegroups.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1701311521430.3457@nanos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-02-10 11:15:09 +01:00
Jan Kara
5469d7c308 ext4: do not use stripe_width if it is not set
Avoid using stripe_width for sbi->s_stripe value if it is not actually
set. It prevents using the stride for sbi->s_stripe.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-02-10 00:56:09 -05:00
Jan Kara
d9b22cf9f5 ext4: fix stripe-unaligned allocations
When a filesystem is created using:

	mkfs.ext4 -b 4096 -E stride=512 <dev>

and we try to allocate 64MB extent, we will end up directly in
ext4_mb_complex_scan_group(). This is because the request is detected
as power-of-two allocation (so we start in ext4_mb_regular_allocator()
with ac_criteria == 0) however the check before
ext4_mb_simple_scan_group() refuses the direct buddy scan because the
allocation request is too large. Since cr == 0, the check whether we
should use ext4_mb_scan_aligned() fails as well and we fall back to
ext4_mb_complex_scan_group().

Fix the problem by checking for upper limit on power-of-two requests
directly when detecting them.

Reported-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-02-10 00:50:56 -05:00
J. Bruce Fields
0839ffb83e nfsd: Revert "nfsd: special case truncates some more"
This patch incorrectly attempted nested mnt_want_write, and incorrectly
disabled nfsd's owner override for truncate.  We'll fix those problems
and make another attempt soon, for the moment I think the safest is to
revert.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-02-09 20:49:20 -05:00
Brian Norris
8672aed7bd pstore: don't OOPS when there are no ftrace zones
We'll OOPS in ramoops_get_next_prz() if the platform didn't ask for any
ftrace zones (i.e., cxt->fprzs will be NULL). Let's just skip this
entire FTRACE section if there's no 'fprzs'.

Regression seen on a coreboot/depthcharge-based Chromebook.

Fixes: 2fbea82bbb ("pstore: Merge per-CPU ftrace records into one")
Cc: Joel Fernandes <joelaf@google.com>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Brian Norris <briannorris@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
2017-02-09 11:49:49 -08:00
Kinglong Mee
6c71100db5 fanotify: simplify the code of fanotify_merge
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2017-02-09 14:09:22 +01:00
Christoph Hellwig
168316db35 dax: assert that i_rwsem is held exclusive for writes
Make sure all callers follow the same locking protocol, given that DAX
transparantly replaced the normal buffered I/O path.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
2017-02-08 14:43:13 -05:00
Christoph Hellwig
ff5462e39c ext4: fix DAX write locking
Unlike O_DIRECT DAX is not an optional opt-in feature selected by the
application, so we'll have to provide the traditional synchronіzation
of overlapping writes as we do for buffered writes.

This was broken historically for DAX, but got fixed for ext2 and XFS
as part of the iomap conversion.  Fix up ext4 as well.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
2017-02-08 14:39:27 -05:00
Jeff Mahoney
2a36224918 btrfs: fix btrfs_compat_ioctl failures on non-compat ioctls
Commit 4c63c2454e incorrectly assumed that returning -ENOIOCTLCMD would
cause the native ioctl to be called.  The ->compat_ioctl callback is
expected to handle all ioctls, not just compat variants.  As a result,
when using 32-bit userspace on 64-bit kernels, everything except those
three ioctls would return -ENOTTY.

Fixes: 4c63c2454e ("btrfs: bugfix: handle FS_IOC32_{GETFLAGS,SETFLAGS,GETVERSION} in btrfs_ioctl")
Cc: stable@vger.kernel.org
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-02-08 17:47:30 +01:00
Eric Biggers
6f69f0ed61 fscrypt: constify struct fscrypt_operations
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Richard Weinberger <richard@nod.at>
2017-02-08 10:59:57 -05:00
Hugh Dickins
b6789123bc mm: fix KPF_SWAPCACHE in /proc/kpageflags
Commit 6326fec112 ("mm: Use owner_priv bit for PageSwapCache, valid
when PageSwapBacked") aliased PG_swapcache to PG_owner_priv_1 (and
depending on PageSwapBacked being true).

As a result, the KPF_SWAPCACHE bit in '/proc/kpageflags' should now be
synthesized, instead of being shown on unrelated pages which just happen
to have PG_owner_priv_1 set.

Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-07 12:08:32 -08:00
Richard Weinberger
b14c8e6afd fscrypt: properly declare on-stack completion
When a completion is declared on-stack we have to use
COMPLETION_INITIALIZER_ONSTACK().

Fixes: 0b81d07790 ("fs crypto: move per-file encryption from f2fs
tree to fs/crypto")
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-02-06 23:45:28 -05:00
Eric Biggers
46f47e4800 fscrypt: split supp and notsupp declarations into their own headers
Previously, each filesystem configured without encryption support would
define all the public fscrypt functions to their notsupp_* stubs.  This
list of #defines had to be updated in every filesystem whenever a change
was made to the public fscrypt functions.  To make things more
maintainable now that we have three filesystems using fscrypt, split the
old header fscrypto.h into several new headers.  fscrypt_supp.h contains
the real declarations and is included by filesystems when configured
with encryption support, whereas fscrypt_notsupp.h contains the inline
stubs and is included by filesystems when configured without encryption
support.  fscrypt_common.h contains common declarations needed by both.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-02-06 23:26:43 -05:00
Colin Ian King
02680b31a0 fscrypt: remove redundant assignment of res
res is assigned to sizeof(ctx), however, this is unused and res
is updated later on without that assigned value to res ever being
used.  Remove this redundant assignment.

Fixes CoverityScan CID#1395546 "Unused value"

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-02-06 23:25:53 -05:00
Theodore Ts'o
783d948544 ext4: add EXT4_IOC_GOINGDOWN ioctl
This ioctl is modeled after the xfs's XFS_IOC_GOINGDOWN ioctl.  (In
fact, it uses the same code points.)

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-02-05 19:47:14 -05:00
Theodore Ts'o
0db1ff222d ext4: add shutdown bit and check for it
Add a shutdown bit that will cause ext4 processing to fail immediately
with EIO.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-02-05 01:28:48 -05:00
Theodore Ts'o
9549a168bd ext4: rename s_resize_flags to s_ext4_flags
We are currently using one bit in s_resize_flags; rename it in order
to allow more of the bits in that unsigned long for other purposes.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-02-05 01:27:48 -05:00
Theodore Ts'o
4753d8a24d ext4: return EROFS if device is r/o and journal replay is needed
If the file system requires journal recovery, and the device is
read-ony, return EROFS to the mount system call.  This allows xfstests
generic/050 to pass.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
2017-02-05 01:26:48 -05:00
Theodore Ts'o
97abd7d4b5 ext4: preserve the needs_recovery flag when the journal is aborted
If the journal is aborted, the needs_recovery feature flag should not
be removed.  Otherwise, it's the journal might not get replayed and
this could lead to more data getting lost.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
2017-02-04 23:38:06 -05:00
Theodore Ts'o
e112666b49 jbd2: don't leak modified metadata buffers on an aborted journal
If the journal has been aborted, we shouldn't mark the underlying
buffer head as dirty, since that will cause the metadata block to get
modified.  And if the journal has been aborted, we shouldn't allow
this since it will almost certainly lead to a corrupted file system.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
2017-02-04 23:14:19 -05:00
Theodore Ts'o
eb5efbcb76 ext4: fix inline data error paths
The write_end() function must always unlock the page and drop its ref
count, even on an error.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
2017-02-04 23:04:00 -05:00
Michal Hocko
d1908f5255 fs: break out of iomap_file_buffered_write on fatal signals
Tetsuo has noticed that an OOM stress test which performs large write
requests can cause the full memory reserves depletion.  He has tracked
this down to the following path

	__alloc_pages_nodemask+0x436/0x4d0
	alloc_pages_current+0x97/0x1b0
	__page_cache_alloc+0x15d/0x1a0          mm/filemap.c:728
	pagecache_get_page+0x5a/0x2b0           mm/filemap.c:1331
	grab_cache_page_write_begin+0x23/0x40   mm/filemap.c:2773
	iomap_write_begin+0x50/0xd0             fs/iomap.c:118
	iomap_write_actor+0xb5/0x1a0            fs/iomap.c:190
	? iomap_write_end+0x80/0x80             fs/iomap.c:150
	iomap_apply+0xb3/0x130                  fs/iomap.c:79
	iomap_file_buffered_write+0x68/0xa0     fs/iomap.c:243
	? iomap_write_end+0x80/0x80
	xfs_file_buffered_aio_write+0x132/0x390 [xfs]
	? remove_wait_queue+0x59/0x60
	xfs_file_write_iter+0x90/0x130 [xfs]
	__vfs_write+0xe5/0x140
	vfs_write+0xc7/0x1f0
	? syscall_trace_enter+0x1d0/0x380
	SyS_write+0x58/0xc0
	do_syscall_64+0x6c/0x200
	entry_SYSCALL64_slow_path+0x25/0x25

the oom victim has access to all memory reserves to make a forward
progress to exit easier.  But iomap_file_buffered_write and other
callers of iomap_apply loop to complete the full request.  We need to
check for fatal signals and back off with a short write instead.

As the iomap_apply delegates all the work down to the actor we have to
hook into those.  All callers that work with the page cache are calling
iomap_write_begin so we will check for signals there.  dax_iomap_actor
has to handle the situation explicitly because it copies data to the
userspace directly.  Other callers like iomap_page_mkwrite work on a
single page or iomap_fiemap_actor do not allocate memory based on the
given len.

Fixes: 68a9f5e700 ("xfs: implement iomap based buffered write path")
Link: http://lkml.kernel.org/r/20170201092706.9966-2-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: <stable@vger.kernel.org>	[4.8+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-03 14:13:19 -08:00
Fabian Frederick
a074faad51 udf: simplify udf_ioctl()
"out" label was only returning error code.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Jan Kara <jack@suse.cz>
2017-02-03 16:24:18 +01:00
Fabian Frederick
782deb2eec udf: fix ioctl errors
Currently, lsattr for instance in udf directory gives
"udf: Invalid argument While reading flags on ..."

This patch returns -ENOIOCTLCMD
when command is unknown to have more accurate message like this:
"Inappropriate ioctl for device While reading flags on ..."

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Jan Kara <jack@suse.cz>
2017-02-03 16:19:54 +01:00
Andrew Price
c548a1c175 gfs2: Make gfs2_write_full_page static
It only gets called from aops.c and doesn't appear in any headers.

Signed-off-by: Andrew Price <anprice@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-02-03 08:23:47 -05:00
Linus Torvalds
1fc576b82b Merge tag 'nfsd-4.10-2' of git://linux-nfs.org/~bfields/linux
Pull nfsd fixes from Bruce Fields:
 "Three more miscellaneous nfsd bugfixes"

* tag 'nfsd-4.10-2' of git://linux-nfs.org/~bfields/linux:
  svcrpc: fix oops in absence of krb5 module
  nfsd: special case truncates some more
  NFSD: Fix a null reference case in find_or_create_lock_stateid()
2017-02-02 12:49:58 -08:00
Omar Sandoval
a7c5437b0b debugfs: add debugfs_lookup()
We don't always have easy access to the dentry of a file or directory we
created in debugfs. Add a helper which allows us to get a dentry we
previously created.

The motivation for this change is a problem with blktrace and the blk-mq
debugfs entries introduced in 07e4fead45 ("blk-mq: create debugfs
directory tree"). Namely, in some cases, the directory that blktrace
needs to create may already exist, but in other cases, it may not. We
_could_ rely on a bunch of implied knowledge to decide whether to create
the directory or not, but it's much cleaner on our end to just look it
up.

Signed-off-by: Omar Sandoval <osandov@fb.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-02-02 10:20:16 -07:00
Jason A. Donenfeld
1c83a9aab8 ext4: move halfmd4 into hash.c directly
The "half md4" transform should not be used by any new code. And
fortunately, it's only used now by ext4. Since ext4 supports several
hashing methods, at some point it might be desirable to move to
something like SipHash. As an intermediate step, remove half md4 from
cryptohash.h and lib, and make it just a local function in ext4's
hash.c. There's precedent for doing this; the other function ext can use
for its hashes -- TEA -- is also implemented in the same place. Also, by
being a local function, this might allow gcc to perform some additional
optimizations.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-02-02 11:52:14 -05:00
Jan Kara
efa7c9f97e block: Get rid of blk_get_backing_dev_info()
blk_get_backing_dev_info() is now a simple dereference. Remove that
function and simplify some code around that.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-02-02 08:21:32 -07:00
Jan Kara
b1d2dc5659 block: Make blk_get_backing_dev_info() safe without open bdev
Currenly blk_get_backing_dev_info() is not safe to be called when the
block device is not open as bdev->bd_disk is NULL in that case. However
inode_to_bdi() uses this function and may be call called from flusher
worker or other writeback related functions without bdev being open
which leads to crashes such as:

[113031.075540] Unable to handle kernel paging request for data at address 0x00000000
[113031.075614] Faulting instruction address: 0xc0000000003692e0
0:mon> t
[c0000000fb65f900] c00000000036cb6c writeback_sb_inodes+0x30c/0x590
[c0000000fb65fa10] c00000000036ced4 __writeback_inodes_wb+0xe4/0x150
[c0000000fb65fa70] c00000000036d33c wb_writeback+0x30c/0x450
[c0000000fb65fb40] c00000000036e198 wb_workfn+0x268/0x580
[c0000000fb65fc50] c0000000000f3470 process_one_work+0x1e0/0x590
[c0000000fb65fce0] c0000000000f38c8 worker_thread+0xa8/0x660
[c0000000fb65fd80] c0000000000fc4b0 kthread+0x110/0x130
[c0000000fb65fe30] c0000000000098f0 ret_from_kernel_thread+0x5c/0x6c

Signed-off-by: Jens Axboe <axboe@fb.com>
2017-02-02 08:20:53 -07:00