diff options
| author | Linus Torvalds <torvalds@linux-foundation.org> | 2016-03-20 19:08:56 -0700 |
|---|---|---|
| committer | Linus Torvalds <torvalds@linux-foundation.org> | 2016-03-20 19:08:56 -0700 |
| commit | 643ad15d47410d37d43daf3ef1c8ac52c281efa5 (patch) | |
| tree | a864860cfe04c994c03d7946e12b3351e38a168b | |
| parent | 24b5e20f11a75866bbffc46c30a22fa50612a769 (diff) | |
| parent | 0d47638f80a02b15869f1fe1fc09e5bf996750fd (diff) | |
| download | linux-643ad15d47410d37d43daf3ef1c8ac52c281efa5.tar.gz linux-643ad15d47410d37d43daf3ef1c8ac52c281efa5.tar.bz2 linux-643ad15d47410d37d43daf3ef1c8ac52c281efa5.zip | |
Merge branch 'mm-pkeys-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 protection key support from Ingo Molnar:
"This tree adds support for a new memory protection hardware feature
that is available in upcoming Intel CPUs: 'protection keys' (pkeys).
There's a background article at LWN.net:
https://lwn.net/Articles/643797/
The gist is that protection keys allow the encoding of
user-controllable permission masks in the pte. So instead of having a
fixed protection mask in the pte (which needs a system call to change
and works on a per page basis), the user can map a (handful of)
protection mask variants and can change the masks runtime relatively
cheaply, without having to change every single page in the affected
virtual memory range.
This allows the dynamic switching of the protection bits of large
amounts of virtual memory, via user-space instructions. It also
allows more precise control of MMU permission bits: for example the
executable bit is separate from the read bit (see more about that
below).
This tree adds the MM infrastructure and low level x86 glue needed for
that, plus it adds a high level API to make use of protection keys -
if a user-space application calls:
mmap(..., PROT_EXEC);
or
mprotect(ptr, sz, PROT_EXEC);
(note PROT_EXEC-only, without PROT_READ/WRITE), the kernel will notice
this special case, and will set a special protection key on this
memory range. It also sets the appropriate bits in the Protection
Keys User Rights (PKRU) register so that the memory becomes unreadable
and unwritable.
So using protection keys the kernel is able to implement 'true'
PROT_EXEC on x86 CPUs: without protection keys PROT_EXEC implies
PROT_READ as well. Unreadable executable mappings have security
advantages: they cannot be read via information leaks to figure out
ASLR details, nor can they be scanned for ROP gadgets - and they
cannot be used by exploits for data purposes either.
We know about no user-space code that relies on pure PROT_EXEC
mappings today, but binary loaders could start making use of this new
feature to map binaries and libraries in a more secure fashion.
There is other pending pkeys work that offers more high level system
call APIs to manage protection keys - but those are not part of this
pull request.
Right now there's a Kconfig that controls this feature
(CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS) that is default enabled
(like most x86 CPU feature enablement code that has no runtime
overhead), but it's not user-configurable at the moment. If there's
any serious problem with this then we can make it configurable and/or
flip the default"
* 'mm-pkeys-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (38 commits)
x86/mm/pkeys: Fix mismerge of protection keys CPUID bits
mm/pkeys: Fix siginfo ABI breakage caused by new u64 field
x86/mm/pkeys: Fix access_error() denial of writes to write-only VMA
mm/core, x86/mm/pkeys: Add execute-only protection keys support
x86/mm/pkeys: Create an x86 arch_calc_vm_prot_bits() for VMA flags
x86/mm/pkeys: Allow kernel to modify user pkey rights register
x86/fpu: Allow setting of XSAVE state
x86/mm: Factor out LDT init from context init
mm/core, x86/mm/pkeys: Add arch_validate_pkey()
mm/core, arch, powerpc: Pass a protection key in to calc_vm_flag_bits()
x86/mm/pkeys: Actually enable Memory Protection Keys in the CPU
x86/mm/pkeys: Add Kconfig prompt to existing config option
x86/mm/pkeys: Dump pkey from VMA in /proc/pid/smaps
x86/mm/pkeys: Dump PKRU with other kernel registers
mm/core, x86/mm/pkeys: Differentiate instruction fetches
x86/mm/pkeys: Optimize fault handling in access_error()
mm/core: Do not enforce PKEY permissions on remote mm access
um, pkeys: Add UML arch_*_access_permitted() methods
mm/gup, x86/mm/pkeys: Check VMAs and PTEs for protection keys
x86/mm/gup: Simplify get_user_pages() PTE bit handling
...
85 files changed, 1406 insertions, 241 deletions
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 1f780d907718..ecc74fa4bfde 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -987,6 +987,9 @@ bytes respectively. Such letter suffixes can also be entirely omitted. See Documentation/x86/intel_mpx.txt for more information about the feature. + nopku [X86] Disable Memory Protection Keys CPU feature found + in some Intel CPUs. + eagerfpu= [X86] on enable eager fpu restore off disable eager fpu restore diff --git a/arch/cris/arch-v32/drivers/cryptocop.c b/arch/cris/arch-v32/drivers/cryptocop.c index 877da1908234..617645d21b20 100644 --- a/arch/cris/arch-v32/drivers/cryptocop.c +++ b/arch/cris/arch-v32/drivers/cryptocop.c @@ -2719,9 +2719,7 @@ static int cryptocop_ioctl_process(struct inode *inode, struct file *filp, unsig /* Acquire the mm page semaphore. */ down_read(¤t->mm->mmap_sem); - err = get_user_pages(current, - current->mm, - (unsigned long int)(oper.indata + prev_ix), + err = get_user_pages((unsigned long int)(oper.indata + prev_ix), noinpages, 0, /* read access only for in data */ 0, /* no force */ @@ -2736,9 +2734,7 @@ static int cryptocop_ioctl_process(struct inode *inode, struct file *filp, unsig } noinpages = err; if (oper.do_cipher){ - err = get_user_pages(current, - current->mm, - (unsigned long int)oper.cipher_outdata, + err = get_user_pages((unsigned long int)oper.cipher_outdata, nooutpages, 1, /* write access for out data */ 0, /* no force */ diff --git a/arch/ia64/include/uapi/asm/siginfo.h b/arch/ia64/include/uapi/asm/siginfo.h index bce9bc1a66c4..f72bf0172bb2 100644 --- a/arch/ia64/include/uapi/asm/siginfo.h +++ b/arch/ia64/include/uapi/asm/siginfo.h @@ -63,10 +63,15 @@ typedef struct siginfo { unsigned int _flags; /* see below */ unsigned long _isr; /* isr */ short _addr_lsb; /* lsb of faulting address */ - struct { - void __user *_lower; - void __user *_upper; - } _addr_bnd; + union { + /* used when si_code=SEGV_BNDERR */ + struct { + void __user *_lower; + void __user *_upper; + } _addr_bnd; + /* used when si_code=SEGV_PKUERR */ + __u32 _pkey; + }; } _sigfault; /* SIGPOLL */ diff --git a/arch/ia64/kernel/err_inject.c b/arch/ia64/kernel/err_inject.c index 0c161ed6d18e..09f845793d12 100644 --- a/arch/ia64/kernel/err_inject.c +++ b/arch/ia64/kernel/err_inject.c @@ -142,8 +142,7 @@ store_virtual_to_phys(struct device *dev, struct device_attribute *attr, u64 virt_addr=simple_strtoull(buf, NULL, 16); int ret; - ret = get_user_pages(current, current->mm, virt_addr, - 1, VM_READ, 0, NULL, NULL); + ret = get_user_pages(virt_addr, 1, VM_READ, 0, NULL, NULL); if (ret<=0) { #ifdef ERR_INJ_DEBUG printk("Virtual address %lx is not existing.\n",virt_addr); diff --git a/arch/mips/include/uapi/asm/siginfo.h b/arch/mips/include/uapi/asm/siginfo.h index 2cb7fdead570..cc49dc240d67 100644 --- a/arch/mips/include/uapi/asm/siginfo.h +++ b/arch/mips/include/uapi/asm/siginfo.h @@ -86,10 +86,15 @@ typedef struct siginfo { int _trapno; /* TRAP # which caused the signal */ #endif short _addr_lsb; - struct { - void __user *_lower; - void __user *_upper; - } _addr_bnd; + union { + /* used when si_code=SEGV_BNDERR */ + struct { + void __user *_lower; + void __user *_upper; + } _addr_bnd; + /* used when si_code=SEGV_PKUERR */ + __u32 _pkey; + }; } _sigfault; /* SIGPOLL, SIGXFSZ (To do ...) */ diff --git a/arch/mips/mm/gup.c b/arch/mips/mm/gup.c index 6cdffc76735c..42d124fb6474 100644 --- a/arch/mips/mm/gup.c +++ b/arch/mips/mm/gup.c @@ -286,8 +286,7 @@ slow_irqon: start += nr << PAGE_SHIFT; pages += nr; - ret = get_user_pages_unlocked(current, mm, start, - (end - start) >> PAGE_SHIFT, + ret = get_user_pages_unlocked(start, (end - start) >> PAGE_SHIFT, write, 0, pages); /* Have to be a bit careful with return values */ diff --git a/arch/powerpc/include/asm/mman.h b/arch/powerpc/include/asm/mman.h index 8565c254151a..2563c435a4b1 100644 --- a/arch/powerpc/include/asm/mman.h +++ b/arch/powerpc/include/asm/mman.h @@ -18,11 +18,12 @@ * This file is included by linux/mman.h, so we can't use cacl_vm_prot_bits() * here. How important is the optimization? */ -static inline unsigned long arch_calc_vm_prot_bits(unsigned long prot) +static inline unsigned long arch_calc_vm_prot_bits(unsigned long prot, + unsigned long pkey) { return (prot & PROT_SAO) ? VM_SAO : 0; } -#define arch_calc_vm_prot_bits(prot) arch_calc_vm_prot_bits(prot) +#define arch_calc_vm_prot_bits(prot, pkey) arch_calc_vm_prot_bits(prot, pkey) static inline pgprot_t arch_vm_get_page_prot(unsigned long vm_flags) { diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h index 878c27771717..4eaab40e3ade 100644 --- a/arch/powerpc/include/asm/mmu_context.h +++ b/arch/powerpc/include/asm/mmu_context.h @@ -148,5 +148,17 @@ static inline void arch_bprm_mm_init(struct mm_struct *mm, { } +static inline bool arch_vma_access_permitted(struct vm_area_struct *vma, + bool write, bool execute, bool foreign) +{ + /* by default, allow everything */ + return true; +} + +static inline bool arch_pte_access_permitted(pte_t pte, bool write) +{ + /* by default, allow everything */ + return true; +} #endif /* __KERNEL__ */ #endif /* __ASM_POWERPC_MMU_CONTEXT_H */ diff --git a/arch/s390/include/asm/mmu_context.h b/arch/s390/include/asm/mmu_context.h index e485817f7b1a..d321469eeda7 100644 --- a/arch/s390/include/asm/mmu_context.h +++ b/arch/s390/include/asm/mmu_context.h @@ -136,4 +136,16 @@ static inline void arch_bprm_mm_init(struct mm_struct *mm, { } +static inline bool arch_vma_access_permitted(struct vm_area_struct *vma, + bool write, bool execute, bool foreign) +{ + /* by default, allow everything */ + return true; +} + +static inline bool arch_pte_access_permitted(pte_t pte, bool write) +{ + /* by default, allow everything */ + return true; +} #endif /* __S390_MMU_CONTEXT_H */ diff --git a/arch/s390/mm/gup.c b/arch/s390/mm/gup.c index 13dab0c1645c..49a1c84ed266 100644 --- a/arch/s390/mm/gup.c +++ b/arch/s390/mm/gup.c @@ -210,7 +210,6 @@ int __get_user_pages_fast(unsigned long start, int nr_pages, int write, int get_user_pages_fast(unsigned long start, int nr_pages, int write, struct page **pages) { - struct mm_struct *mm = current->mm; int nr, ret; might_sleep(); @@ -222,8 +221,7 @@ int get_user_pages_fast(unsigned long start, int nr_pages, int write, /* Try to get the remaining pages with get_user_pages */ start += nr << PAGE_SHIFT; pages += nr; - ret = get_user_pages_unlocked(current, mm, start, - nr_pages - nr, write, 0, pages); + ret = get_user_pages_unlocked(start, nr_pages - nr, write, 0, pages); /* Have to be a bit careful with return values */ if (nr > 0) ret = (ret < 0) ? nr : ret + nr; diff --git a/arch/sh/mm/gup.c b/arch/sh/mm/gup.c index e7af6a65baab..40fa6c8adc43 100644 --- a/arch/sh/mm/gup.c +++ b/arch/sh/mm/gup.c @@ -257,7 +257,7 @@ slow_irqon: start += nr << PAGE_SHIFT; pages += nr; - ret = get_user_pages_unlocked(current, mm, start, + ret = get_user_pages_unlocked(start, (end - start) >> PAGE_SHIFT, write, 0, pages); /* Have to be a bit careful with return values */ diff --git a/arch/sparc/mm/gup.c b/arch/sparc/mm/gup.c index eb3d8e8ebc6b..4e06750a5d29 100644 --- a/arch/sparc/mm/gup.c +++ b/arch/sparc/mm/gup.c @@ -237,7 +237,7 @@ slow: start += nr << PAGE_SHIFT; pages += nr; - ret = get_user_pages_unlocked(current, mm, start, + ret = get_user_pages_unlocked(start, (end - start) >> PAGE_SHIFT, write, 0, pages); /* Have to be a bit careful with return values */ diff --git a/arch/um/include/asm/mmu_context.h b/arch/um/include/asm/mmu_context.h index 941527e507f7..1a60e1328e2f 100644 --- a/arch/um/include/asm/mmu_context.h +++ b/arch/um/include/asm/mmu_context.h @@ -27,6 +27,20 @@ static inline void arch_bprm_mm_init(struct mm_struct *mm, struct vm_area_struct *vma) { } + +static inline bool arch_vma_access_permitted(struct vm_area_struct *vma, + bool write, bool execute, bool foreign) +{ + /* by default, allow everything */ + return true; +} + +static inline bool arch_pte_access_permitted(pte_t pte, bool write) +{ + /* by default, allow everything */ + return true; +} + /* * end asm-generic/mm_hooks.h functions */ diff --git a/arch/unicore32/include/asm/mmu_context.h b/arch/unicore32/include/asm/mmu_context.h index 1cb5220afaf9..e35632ef23c7 100644 --- a/arch/unicore32/include/asm/mmu_context.h +++ b/arch/unicore32/include/asm/mmu_context.h @@ -97,4 +97,16 @@ static inline void arch_bprm_mm_init(struct mm_struct *mm, { } +static inline bool arch_vma_access_permitted(struct vm_area_struct *vma, + bool write, bool foreign) +{ + /* by default, allow everything */ + return true; +} + +static inline bool arch_pte_access_permitted(pte_t pte, bool write) +{ + /* by default, allow everything */ + return true; +} #endif diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index d07cca6ad37b..8b680a5cb25b 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -156,6 +156,8 @@ config X86 select X86_DEV_DMA_OPS if X86_64 select X86_FEATURE_NAMES if PROC_FS select HAVE_STACK_VALIDATION if X86_64 + select ARCH_USES_HIGH_VMA_FLAGS if X86_INTEL_MEMORY_PROTECTION_KEYS + select ARCH_HAS_PKEYS if X86_INTEL_MEMORY_PROTECTION_KEYS config INSTRUCTION_DECODER def_bool y @@ -1719,6 +1721,20 @@ config X86_INTEL_MPX If unsure, say N. +config X86_INTEL_MEMORY_PROTECTION_KEYS + prompt "Intel Memory Protection Keys" + def_bool y + # Note: only available in 64-bit mode |
