diff options
Diffstat (limited to 'Documentation/arch/x86')
44 files changed, 8681 insertions, 0 deletions
diff --git a/Documentation/arch/x86/amd-memory-encryption.rst b/Documentation/arch/x86/amd-memory-encryption.rst new file mode 100644 index 000000000000..934310ce7258 --- /dev/null +++ b/Documentation/arch/x86/amd-memory-encryption.rst @@ -0,0 +1,133 @@ +.. SPDX-License-Identifier: GPL-2.0 + +===================== +AMD Memory Encryption +===================== + +Secure Memory Encryption (SME) and Secure Encrypted Virtualization (SEV) are +features found on AMD processors. + +SME provides the ability to mark individual pages of memory as encrypted using +the standard x86 page tables. A page that is marked encrypted will be +automatically decrypted when read from DRAM and encrypted when written to +DRAM. SME can therefore be used to protect the contents of DRAM from physical +attacks on the system. + +SEV enables running encrypted virtual machines (VMs) in which the code and data +of the guest VM are secured so that a decrypted version is available only +within the VM itself. SEV guest VMs have the concept of private and shared +memory. Private memory is encrypted with the guest-specific key, while shared +memory may be encrypted with hypervisor key. When SME is enabled, the hypervisor +key is the same key which is used in SME. + +A page is encrypted when a page table entry has the encryption bit set (see +below on how to determine its position). The encryption bit can also be +specified in the cr3 register, allowing the PGD table to be encrypted. Each +successive level of page tables can also be encrypted by setting the encryption +bit in the page table entry that points to the next table. This allows the full +page table hierarchy to be encrypted. Note, this means that just because the +encryption bit is set in cr3, doesn't imply the full hierarchy is encrypted. +Each page table entry in the hierarchy needs to have the encryption bit set to +achieve that. So, theoretically, you could have the encryption bit set in cr3 +so that the PGD is encrypted, but not set the encryption bit in the PGD entry +for a PUD which results in the PUD pointed to by that entry to not be +encrypted. + +When SEV is enabled, instruction pages and guest page tables are always treated +as private. All the DMA operations inside the guest must be performed on shared +memory. Since the memory encryption bit is controlled by the guest OS when it +is operating in 64-bit or 32-bit PAE mode, in all other modes the SEV hardware +forces the memory encryption bit to 1. + +Support for SME and SEV can be determined through the CPUID instruction. The +CPUID function 0x8000001f reports information related to SME:: + + 0x8000001f[eax]: + Bit[0] indicates support for SME + Bit[1] indicates support for SEV + 0x8000001f[ebx]: + Bits[5:0] pagetable bit number used to activate memory + encryption + Bits[11:6] reduction in physical address space, in bits, when + memory encryption is enabled (this only affects + system physical addresses, not guest physical + addresses) + +If support for SME is present, MSR 0xc00100010 (MSR_AMD64_SYSCFG) can be used to +determine if SME is enabled and/or to enable memory encryption:: + + 0xc0010010: + Bit[23] 0 = memory encryption features are disabled + 1 = memory encryption features are enabled + +If SEV is supported, MSR 0xc0010131 (MSR_AMD64_SEV) can be used to determine if +SEV is active:: + + 0xc0010131: + Bit[0] 0 = memory encryption is not active + 1 = memory encryption is active + +Linux relies on BIOS to set this bit if BIOS has determined that the reduction +in the physical address space as a result of enabling memory encryption (see +CPUID information above) will not conflict with the address space resource +requirements for the system. If this bit is not set upon Linux startup then +Linux itself will not set it and memory encryption will not be possible. + +The state of SME in the Linux kernel can be documented as follows: + + - Supported: + The CPU supports SME (determined through CPUID instruction). + + - Enabled: + Supported and bit 23 of MSR_AMD64_SYSCFG is set. + + - Active: + Supported, Enabled and the Linux kernel is actively applying + the encryption bit to page table entries (the SME mask in the + kernel is non-zero). + +SME can also be enabled and activated in the BIOS. If SME is enabled and +activated in the BIOS, then all memory accesses will be encrypted and it will +not be necessary to activate the Linux memory encryption support. If the BIOS +merely enables SME (sets bit 23 of the MSR_AMD64_SYSCFG), then Linux can activate +memory encryption by default (CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT=y) or +by supplying mem_encrypt=on on the kernel command line. However, if BIOS does +not enable SME, then Linux will not be able to activate memory encryption, even +if configured to do so by default or the mem_encrypt=on command line parameter +is specified. + +Secure Nested Paging (SNP) +========================== + +SEV-SNP introduces new features (SEV_FEATURES[1:63]) which can be enabled +by the hypervisor for security enhancements. Some of these features need +guest side implementation to function correctly. The below table lists the +expected guest behavior with various possible scenarios of guest/hypervisor +SNP feature support. + ++-----------------+---------------+---------------+------------------+ +| Feature Enabled | Guest needs | Guest has | Guest boot | +| by the HV | implementation| implementation| behaviour | ++=================+===============+===============+==================+ +| No | No | No | Boot | +| | | | | ++-----------------+---------------+---------------+------------------+ +| No | Yes | No | Boot | +| | | | | ++-----------------+---------------+---------------+------------------+ +| No | Yes | Yes | Boot | +| | | | | ++-----------------+---------------+---------------+------------------+ +| Yes | No | No | Boot with | +| | | | feature enabled | ++-----------------+---------------+---------------+------------------+ +| Yes | Yes | No | Graceful boot | +| | | | failure | ++-----------------+---------------+---------------+------------------+ +| Yes | Yes | Yes | Boot with | +| | | | feature enabled | ++-----------------+---------------+---------------+------------------+ + +More details in AMD64 APM[1] Vol 2: 15.34.10 SEV_STATUS MSR + +[1] https://www.amd.com/system/files/TechDocs/40332.pdf diff --git a/Documentation/arch/x86/amd_hsmp.rst b/Documentation/arch/x86/amd_hsmp.rst new file mode 100644 index 000000000000..440e4b645a1c --- /dev/null +++ b/Documentation/arch/x86/amd_hsmp.rst @@ -0,0 +1,86 @@ +.. SPDX-License-Identifier: GPL-2.0 + +============================================ +AMD HSMP interface +============================================ + +Newer Fam19h EPYC server line of processors from AMD support system +management functionality via HSMP (Host System Management Port). + +The Host System Management Port (HSMP) is an interface to provide +OS-level software with access to system management functions via a +set of mailbox registers. + +More details on the interface can be found in chapter +"7 Host System Management Port (HSMP)" of the family/model PPR +Eg: https://www.amd.com/system/files/TechDocs/55898_B1_pub_0.50.zip + +HSMP interface is supported on EPYC server CPU models only. + + +HSMP device +============================================ + +amd_hsmp driver under the drivers/platforms/x86/ creates miscdevice +/dev/hsmp to let user space programs run hsmp mailbox commands. + +$ ls -al /dev/hsmp +crw-r--r-- 1 root root 10, 123 Jan 21 21:41 /dev/hsmp + +Characteristics of the dev node: + * Write mode is used for running set/configure commands + * Read mode is used for running get/status monitor commands + +Access restrictions: + * Only root user is allowed to open the file in write mode. + * The file can be opened in read mode by all the users. + +In-kernel integration: + * Other subsystems in the kernel can use the exported transport + function hsmp_send_message(). + * Locking across callers is taken care by the driver. + + +An example +========== + +To access hsmp device from a C program. +First, you need to include the headers:: + + #include <linux/amd_hsmp.h> + +Which defines the supported messages/message IDs. + +Next thing, open the device file, as follows:: + + int file; + + file = open("/dev/hsmp", O_RDWR); + if (file < 0) { + /* ERROR HANDLING; you can check errno to see what went wrong */ + exit(1); + } + +The following IOCTL is defined: + +``ioctl(file, HSMP_IOCTL_CMD, struct hsmp_message *msg)`` + The argument is a pointer to a:: + + struct hsmp_message { + __u32 msg_id; /* Message ID */ + __u16 num_args; /* Number of input argument words in message */ + __u16 response_sz; /* Number of expected output/response words */ + __u32 args[HSMP_MAX_MSG_LEN]; /* argument/response buffer */ + __u16 sock_ind; /* socket number */ + }; + +The ioctl would return a non-zero on failure; you can read errno to see +what happened. The transaction returns 0 on success. + +More details on the interface and message definitions can be found in chapter +"7 Host System Management Port (HSMP)" of the respective family/model PPR +eg: https://www.amd.com/system/files/TechDocs/55898_B1_pub_0.50.zip + +User space C-APIs are made available by linking against the esmi library, +which is provided by the E-SMS project https://developer.amd.com/e-sms/. +See: https://github.com/amd/esmi_ib_library diff --git a/Documentation/arch/x86/boot.rst b/Documentation/arch/x86/boot.rst new file mode 100644 index 000000000000..33520ecdb37a --- /dev/null +++ b/Documentation/arch/x86/boot.rst @@ -0,0 +1,1443 @@ +.. SPDX-License-Identifier: GPL-2.0 + +=========================== +The Linux/x86 Boot Protocol +=========================== + +On the x86 platform, the Linux kernel uses a rather complicated boot +convention. This has evolved partially due to historical aspects, as +well as the desire in the early days to have the kernel itself be a +bootable image, the complicated PC memory model and due to changed +expectations in the PC industry caused by the effective demise of +real-mode DOS as a mainstream operating system. + +Currently, the following versions of the Linux/x86 boot protocol exist. + +============= ============================================================ +Old kernels zImage/Image support only. Some very early kernels + may not even support a command line. + +Protocol 2.00 (Kernel 1.3.73) Added bzImage and initrd support, as + well as a formalized way to communicate between the + boot loader and the kernel. setup.S made relocatable, + although the traditional setup area still assumed + writable. + +Protocol 2.01 (Kernel 1.3.76) Added a heap overrun warning. + +Protocol 2.02 (Kernel 2.4.0-test3-pre3) New command line protocol. + Lower the conventional memory ceiling. No overwrite + of the traditional setup area, thus making booting + safe for systems which use the EBDA from SMM or 32-bit + BIOS entry points. zImage deprecated but still + supported. + +Protocol 2.03 (Kernel 2.4.18-pre1) Explicitly makes the highest possible + initrd address available to the bootloader. + +Protocol 2.04 (Kernel 2.6.14) Extend the syssize field to four bytes. + +Protocol 2.05 (Kernel 2.6.20) Make protected mode kernel relocatable. + Introduce relocatable_kernel and kernel_alignment fields. + +Protocol 2.06 (Kernel 2.6.22) Added a field that contains the size of + the boot command line. + +Protocol 2.07 (Kernel 2.6.24) Added paravirtualised boot protocol. + Introduced hardware_subarch and hardware_subarch_data + and KEEP_SEGMENTS flag in load_flags. + +Protocol 2.08 (Kernel 2.6.26) Added crc32 checksum and ELF format + payload. Introduced payload_offset and payload_length + fields to aid in locating the payload. + +Protocol 2.09 (Kernel 2.6.26) Added a field of 64-bit physical + pointer to single linked list of struct setup_data. + +Protocol 2.10 (Kernel 2.6.31) Added a protocol for relaxed alignment + beyond the kernel_alignment added, new init_size and + pref_address fields. Added extended boot loader IDs. + +Protocol 2.11 (Kernel 3.6) Added a field for offset of EFI handover + protocol entry point. + +Protocol 2.12 (Kernel 3.8) Added the xloadflags field and extension fields + to struct boot_params for loading bzImage and ramdisk + above 4G in 64bit. + +Protocol 2.13 (Kernel 3.14) Support 32- and 64-bit flags being set in + xloadflags to support booting a 64-bit kernel from 32-bit + EFI + +Protocol 2.14 BURNT BY INCORRECT COMMIT + ae7e1238e68f2a472a125673ab506d49158c1889 + (x86/boot: Add ACPI RSDP address to setup_header) + DO NOT USE!!! ASSUME SAME AS 2.13. + +Protocol 2.15 (Kernel 5.5) Added the kernel_info and kernel_info.setup_type_max. +============= ============================================================ + +.. note:: + The protocol version number should be changed only if the setup header + is changed. There is no need to update the version number if boot_params + or kernel_info are changed. Additionally, it is recommended to use + xloadflags (in this case the protocol version number should not be + updated either) or kernel_info to communicate supported Linux kernel + features to the boot loader. Due to very limited space available in + the original setup header every update to it should be considered + with great care. Starting from the protocol 2.15 the primary way to + communicate things to the boot loader is the kernel_info. + + +Memory Layout +============= + +The traditional memory map for the kernel loader, used for Image or +zImage kernels, typically looks like:: + + | | + 0A0000 +------------------------+ + | Reserved for BIOS | Do not use. Reserved for BIOS EBDA. + 09A000 +------------------------+ + | Command line | + | Stack/heap | For use by the kernel real-mode code. + 098000 +------------------------+ + | Kernel setup | The kernel real-mode code. + 090200 +------------------------+ + | Kernel boot sector | The kernel legacy boot sector. + 090000 +------------------------+ + | Protected-mode kernel | The bulk of the kernel image. + 010000 +------------------------+ + | Boot loader | <- Boot sector entry point 0000:7C00 + 001000 +------------------------+ + | Reserved for MBR/BIOS | + 000800 +------------------------+ + | Typically used by MBR | + 000600 +------------------------+ + | BIOS use only | + 000000 +------------------------+ + +When using bzImage, the protected-mode kernel was relocated to +0x100000 ("high memory"), and the kernel real-mode block (boot sector, +setup, and stack/heap) was made relocatable to any address between +0x10000 and end of low memory. Unfortunately, in protocols 2.00 and +2.01 the 0x90000+ memory range is still used internally by the kernel; +the 2.02 protocol resolves that problem. + +It is desirable to keep the "memory ceiling" -- the highest point in +low memory touched by the boot loader -- as low as possible, since +some newer BIOSes have begun to allocate some rather large amounts of +memory, called the Extended BIOS Data Area, near the top of low +memory. The boot loader should use the "INT 12h" BIOS call to verify +how much low memory is available. + +Unfortunately, if INT 12h reports that the amount of memory is too +low, there is usually nothing the boot loader can do but to report an +error to the user. The boot loader should therefore be designed to +take up as little space in low memory as it reasonably can. For +zImage or old bzImage kernels, which need data written into the +0x90000 segment, the boot loader should make sure not to use memory +above the 0x9A000 point; too many BIOSes will break above that point. + +For a modern bzImage kernel with boot protocol version >= 2.02, a +memory layout like the following is suggested:: + + ~ ~ + | Protected-mode kernel | + 100000 +------------------------+ + | I/O memory hole | + 0A0000 +------------------------+ + | Reserved for BIOS | Leave as much as possible unused + ~ ~ + | Command line | (Can also be below the X+10000 mark) + X+10000 +------------------------+ + | Stack/heap | For use by the kernel real-mode code. + X+08000 +------------------------+ + | Kernel setup | The kernel real-mode code. + | Kernel boot sector | The kernel legacy boot sector. + X +------------------------+ + | Boot loader | <- Boot sector entry point 0000:7C00 + 001000 +------------------------+ + | Reserved for MBR/BIOS | + 000800 +------------------------+ + | Typically used by MBR | + 000600 +------------------------+ + | BIOS use only | + 000000 +------------------------+ + + ... where the address X is as low as the design of the boot loader permits. + + +The Real-Mode Kernel Header +=========================== + +In the following text, and anywhere in the kernel boot sequence, "a +sector" refers to 512 bytes. It is independent of the actual sector +size of the underlying medium. + +The first step in loading a Linux kernel should be to load the +real-mode code (boot sector and setup code) and then examine the +following header at offset 0x01f1. The real-mode code can total up to +32K, although the boot loader may choose to load only the first two +sectors (1K) and then examine the bootup sector size. + +The header looks like: + +=========== ======== ===================== ============================================ +Offset/Size Proto Name Meaning +=========== ======== ===================== ============================================ +01F1/1 ALL(1) setup_sects The size of the setup in sectors +01F2/2 ALL root_flags If set, the root is mounted readonly +01F4/4 2.04+(2) syssize The size of the 32-bit code in 16-byte paras +01F8/2 ALL ram_size DO NOT USE - for bootsect.S use only +01FA/2 ALL vid_mode Video mode control +01FC/2 ALL root_dev Default root device number +01FE/2 ALL boot_flag 0xAA55 magic number +0200/2 2.00+ jump Jump instruction +0202/4 2.00+ header Magic signature "HdrS" +0206/2 2.00+ version Boot protocol version supported +0208/4 2.00+ realmode_swtch Boot loader hook (see below) +020C/2 2.00+ start_sys_seg The load-low segment (0x1000) (obsolete) +020E/2 2.00+ kernel_version Pointer to kernel version string +0210/1 2.00+ type_of_loader Boot loader identifier +0211/1 2.00+ loadflags Boot protocol option flags +0212/2 2.00+ setup_move_size Move to high memory size (used with hooks) +0214/4 2.00+ code32_start Boot loader hook (see below) +0218/4 2.00+ ramdisk_image initrd load address (set by boot loader) +021C/4 2.00+ ramdisk_size initrd size (set by boot loader) +0220/4 2.00+ bootsect_kludge DO NOT USE - for bootsect.S use only +0224/2 2.01+ heap_end_ptr Free memory after setup end +0226/1 2.02+(3) ext_loader_ver Extended boot loader version +0227/1 2.02+(3) ext_loader_type Extended boot loader ID +0228/4 2.02+ cmd_line_ptr 32-bit pointer to the kernel command line +022C/4 2.03+ initrd_addr_max Highest legal initrd address +0230/4 2.05+ kernel_alignment Physical addr alignment required for kernel +0234/1 2.05+ relocatable_kernel Whether kernel is relocatable or not +0235/1 2.10+ min_alignment Minimum alignment, as a power of two +0236/2 2.12+ xloadflags Boot protocol option flags +0238/4 2.06+ cmdline_size Maximum size of the kernel command line +023C/4 2.07+ hardware_subarch Hardware subarchitecture +0240/8 2.07+ hardware_subarch_data Subarchitecture-specific data +0248/4 2.08+ payload_offset Offset of kernel payload +024C/4 2.08+ payload_length Length of kernel payload +0250/8 2.09+ setup_data 64-bit physical pointer to linked list + of struct setup_data +0258/8 2.10+ pref_address Preferred loading address +0260/4 2.10+ init_size Linear memory required during initialization +0264/4 2.11+ handover_offset Offset of handover entry point +0268/4 2.15+ kernel_info_offset Offset of the kernel_info +=========== ======== ===================== ============================================ + +.. note:: + (1) For backwards compatibility, if the setup_sects field contains 0, the + real value is 4. + + (2) For boot protocol prior to 2.04, the upper two bytes of the syssize + field are unusable, which means the size of a bzImage kernel + cannot be determined. + + (3) Ignored, but safe to set, for boot protocols 2.02-2.09. + +If the "HdrS" (0x53726448) magic number is not found at offset 0x202, +the boot protocol version is "old". Loading an old kernel, the +following parameters should be assumed:: + + Image type = zImage + initrd not supported + Real-mode kernel must be located at 0x90000. + +Otherwise, the "version" field contains the protocol version, +e.g. protocol version 2.01 will contain 0x0201 in this field. When +setting fields in the header, you must make sure only to set fields +supported by the protocol version in use. + + +Details of Header Fields +======================== + +For each field, some are information from the kernel to the bootloader +("read"), some are expected to be filled out by the bootloader +("write"), and some are expected to be read and modified by the +bootloader ("modify"). + +All general purpose boot loaders should write the fields marked +(obligatory). Boot loaders who want to load the kernel at a +nonstandard address should fill in the fields marked (reloc); other +boot loaders can ignore those fields. + +The byte order of all fields is littleendian (this is x86, after all.) + +============ =========== +Field name: setup_sects +Type: read +Offset/size: 0x1f1/1 +Protocol: ALL +============ =========== + + The size of the setup code in 512-byte sectors. If this field is + 0, the real value is 4. The real-mode code consists of the boot + sector (always one 512-byte sector) plus the setup code. + +============ ================= +Field name: root_flags +Type: modify (optional) +Offset/size: 0x1f2/2 +Protocol: ALL +============ ================= + + If this field is nonzero, the root defaults to readonly. The use of + this field is deprecated; use the "ro" or "rw" options on the + command line instead. + +============ =============================================== +Field name: syssize +Type: read +Offset/size: 0x1f4/4 (protocol 2.04+) 0x1f4/2 (protocol ALL) +Protocol: 2.04+ +============ =============================================== + + The size of the protected-mode code in units of 16-byte paragraphs. + For protocol versions older than 2.04 this field is only two bytes + wide, and therefore cannot be trusted for the size of a kernel if + the LOAD_HIGH flag is set. + +============ =============== +Field name: ram_size +Type: kernel internal +Offset/size: 0x1f8/2 +Protocol: ALL +============ =============== + + This field is obsolete. + +============ =================== +Field name: vid_mode +Type: modify (obligatory) +Offset/size: 0x1fa/2 +============ =================== + + Please see the section on SPECIAL COMMAND LINE OPTIONS. + +============ ================= +Field name: root_dev +Type: modify (optional) +Offset/size: 0x1fc/2 +Protocol: ALL +============ ================= + + The default root device device number. The use of this field is + deprecated, use the "root=" option on the command line instead. + +============ ========= +Field name: boot_flag +Type: read +Offset/size: 0x1fe/2 +Protocol: ALL +============ ========= + + Contains 0xAA55. This is the closest thing old Linux kernels have + to a magic number. + +============ ======= +Field name: jump +Type: read +Offset/size: 0x200/2 +Protocol: 2.00+ +============ ======= + + Contains an x86 jump instruction, 0xEB followed by a signed offset + relative to byte 0x202. This can be used to determine the size of + the header. + +============ ======= +F |