x86 assembly language is the assembly language for the x86 class of processors, which includes Intel's Pentium series and AMD's Athlon series.
The modern x86 instruction set is really a series of extensions of instruction sets that began with the Intel 8008 microprocessor. Nearly full binary backward compatibility is actually present between the Intel 8086 chip through to the modern Pentium 4, Intel Core, Athlon 64, Opteron, etc. processors. (There are certain unusual exceptions, such as the counted shift instructions, corrections to the original PUSHA instruction, some orphaned Intel 80286 semantics, the dropped LOADALL instruction, and the Pentium 4 giving up on precise FPU operation counts.) Each successive instruction extension has been either simply directly added, or accompanied by adding execution modes to the processor.
However, unlike many other processor families such as Power and 68K, there is no hardware support for multiple stacks and so data and control-flow information must be combined into the same stack.
In order to alleviate this limitation, instructions like ENTER/LEAVE, or other direct manipulations of the stack register (ESP) can be used for saving local data in the stack. The instruction architecture also includes PUSH/POP instructions for direct usage of the stack for integer and address quantities. This helps simplify ABI specifications with respect to "call stack" software support mechanisms as compared with some RISC architectures which must be more explicit about call stack details.
The combination of a single hardware stack and the limited number of other registers available creates one of the most significent bottlenecks in x86 code.
The instruction set is based on similar ideas in each mode, but involves different ways of accessing memory and thus employs different programming strategies.
For information on the assembly language within a respective mode, see:
A memory reference specifies a 16-bit offset in a segment; the actual 20-bit address is given by SEGMENT * 16 + OFFSET, where SEGMENT is the contents of the segment register for the segment, and OFFSET is the offset within that segment. Segments are either implicit or made explicit via a segment override. By default the general registers are assumed to use the DS (data) segment, the stack registers are assumed to use the SS (stack) segment, and IP is assumed to use the CS (code) segment. This segmented architecture allowed for addressing just a little over 1MB of memory; however, only 64K could be addressed within a given segment at any one time. On earlier IBM PC compatible machines, this also caused great confusion with something called the "A20" line, since, while the addresses from 0x100000 to 0x10FFEF could technically be addressed, early systems generally did not make the extra 64K of memory available, instead dropping the top bit which ended up wrapping the address. However, later systems did not exhibit this behaviour, since the x86 evolved ways of addressing more than 1MB of memory.
In order to use more than 64K of memory, the segment registers must be used. This created great complications for C compiler implementors who introduced odd pointer modes such as "near", "far" and "huge" to leverage the implicit nature of segmented architecture to different degrees, with some pointers containing 16-bit offsets within implied segments and other pointers containing segment addresses and offsets within segments.
In protected mode, a segment register no longer contains the physical address of the beginning of a segment, but contain a "selector" that points to a system-level structure called a segment descriptor. A segment descriptor contains the physical address of the beginning of the segment, the length of the segment, and access permissions to that segment. The offset is checked against the length of the segment, with offsets referring to locations outside the segment causing an exception. Offsets referring to locations inside the segment are combined with the physical address of the beginning of the segment to get the physical address corresponding to that offset.
The instruction set in protected mode is perfectly backward compatible with the one used in real mode.
In this mode, the same techniques used to access more than 64K of memory in real mode are used; "far", or long, pointers contain segment selectors rather than segment addresses.
In the 80386, 32-bit protected mode was added. It enables full 32-bit addressing, paging, a few more registers, and some new instructions to handle the 32-bit addressing.
In 32-bit protected mode, with paging not enabled, the address in a segment descriptor is the physical address of the beginning of the segment, and the address calculated from the address of the beginning of a segment and the offset within that segment is a physical address. With paging enabled, the address in a segment descriptor is the "linear" address, in a 32-bit address space, of the beginning of the segment, and the address calculated from the address of the beginning of a segment and the offset within that segment is a linear address in that address space. Addresses in that address space are translated to physical addresses via a page table. Linear addresses are 32-bit addresses. By default, physical addresses are also 32-bit addresses; however, there exists a page extension mode called Physical Address Extension or PAE, first added in the Intel Pentium Pro, which allows an additional 4 bits of physical addressing. This mode does not change the length of segment offsets or linear addresses; those are still only 32 bits.
In this mode, as offsets within segments are 32 bits, there was less need for explicit segmentation, and, as 48-bit segmented addresses (segment selectors plus offset within segment) were translated to 32-bit linear addresses, explicit segmentation didn't conveniently expand the address space available to a program. Therefore, C compiler vendors and operating system vendors rarely supported segmented addresses in 32-bit protected mode.
x86 processors that support protected mode boot into real mode for backward compatibility with the older 8086 class of processors. Typically, the operating system is responsible for switching to protected mode if it so wishes.
To switch to long mode, the processor has to first switch from real mode to protected mode, and then to long mode.
Starting with the Intel 80386 processor, the x86 in 32-bit protected mode extended the 16-bit registers to 32 bits (EAX, EBX, ECX, EDX, ESI, EDI, EBP, ESP, EFLAGS, EIP). The older 16-bit registers were overlayed with the bottom half of the 32-bit registers and could be accessed with an instruction override. There is no "high-half" 16-bit register access; instead, Intel chose to generalize the addressing so that every register could be used for scaled index addressing, and so that EBP could be used as a general register, as well as a stack register.
Starting with the AMD Opteron processor, the x86 in 64-bit long mode (as a subset of AMD64 or x86-64 mode) extended the 32-bit registers in a similar way that 32-bit protected mode did before it (RAX, RBX, RCX, RDX, RSI, RDI, RBP, RSP, RFLAGS, RIP). However, AMD also added 8 additional 64-bit general registers (R8, R9, ..., R15).
The addressing modes were not dramatically changed from 32-bit mode, except that addressing was extended to 64 bits, physical addressing is now sign extended (so memory always adds equally to the top and bottom of memory; note that this does not affect linear or virtual addressing), and other selector details have been dramatically reduced.
The x86 floating point instructions can operate in one of 3 possible execution modes with respect to operand size: 32-bit, 64-bit or 80-bit, as well as various rounding modes. For compatibility (with external non-x86 sources, such as data generated on a RISC processor, which will typically support only 64-bit mode) reasons, the size mode is usually set to 64-bit, and the rounding mode is set to (TBD). However, some C and Fortran compilers use the full 80-bit precision for maximum accuracy.
Note that the AMD64 did not add additional entries to the floating point stack, though the additional integer registers can be used for memory addressing.
MMX instructions use the MMX registers as pairs of 32-bit integer values, or sets of 4 16-bit integer values, or sets of 8 8-bit integer values.
Note that the AMD64 architecture did not add additional MMX registers, though the additional integer registers can be used for memory addressing.
The format of these registers depends on the instructions using them. The original SSE instruction set uses them as 4 simultaneous 32-bit floating point values. SSE2 allows usage of them as 2 simultaneous 64-bit floating point values, 4 simultaneous 32-bit integer values, 8 simultaneous 16-bit integer values, or 16 simultaneous 8-bit values.
The stack register to stack register format of the instructions is usually F(OP) st, st(*) or F(OP) st(*), st. Where st is equivalent to st(0), and st(*) is one of the 8 stack registers (st(0), st(1), ..., st(7)) Like the integers, the first operand is both the first source operand and the destination operand. FSUBR and FDIVR should be singled out as first swapping the source operands before performing the subtraction or division. The addition, subtraction, multiplication, division, store and comparison instructions include instruction modes that will pop the top of the stack after their operation is complete. So for example FADDP st(1), st performs the calculation st(1) = st(1) + st(0), then removes st(0) from the top of stack, thus making what was the result in st(1) the top of the stack in st(0).
These instruction sets also include numerous fixed sub-word instructions for shuffling, inserting and extracting the values around within the registers. In addition there are instructions for moving data between the integer registers and SSE/MMX registers.
The x86 instruction set includes string load, store and move instructions (LODS, STOS, and MOVS) which perform each operation to a specified size (B for 8-bit byte, W for 16-bit word, D for 32-bit double word) then increments the implicit address register (SI for LODS, DI for STOS, and both for MOVS). For the load and store, the implicit target/source register is in the AL, AX or EAX register (depending on size.) The implicit segment used is DS, except for MOVS which uses ES for the store and DS for the load. In modern x86 processors, these complex instructions don't offer any performance advantage over more simply implemented separate load/store and address increment instructions.
The stack is implemented with an implicitly decrementing (push) and incrementing (pop) stack pointer. In 16-bit mode, this implicit stack pointer is addressed as SS:in 32-bit mode it's SS:*. The stack pointer actually points to the next value that will be stored, under the assumption that its size will match the operating mode of the processor (i.e., 16, 32, or 64 bits) to match the default width of the PUSH/POP/CALL/RET instructions. Also included are the instructions ENTER and LEAVE which reserve and remove data from the top of the stack while setting up a stack frame pointer in BP/EBP/RBP. However, direct setting, or addition and subtraction to the SP/ESP/RSP register is also supported, so the ENTER/LEAVE instructions are generally unnecessary. Other instructions for manipulating the stack include PUSHF/POPF for storing and retrieving the (E)FLAGS register. The PUSHA/POPA instructions will store and retrieve the entire integer register state to and from the stack.
Values for a SIMD load or store are assumed to be packed in adjacent positions for the SIMD register and will align them in sequential little-endian order. Some SSE load and store instructions require 16-byte alignment to function properly. The SIMD instruction sets also include "prefetch" instructions which perform the load but do not target any register, used for cache loading. The SSE instruction sets also include non-temporal store instructions which will perform stores straight to memory without performing a cache allocate if the destination is not already cached (otherwise it will behave like a regular store.)
Most generic integer and floating point (but no SIMD) instructions can use one parameter as a complex address as the second source parameter. Integer instructions can also accept one memory parameter as a destination operand.
jmp which can take an immediate address, a register or an indirect address as a parameter. (Note that most RISC processors only support a link register or short immediate displacement for jumping.)
Also supported are several conditional jumps, including je (jump on equality), jne (jump on inequality), jg (jump on greater than, signed), jl (jump on less than, signed), ja (jump on above/greater than, unsigned), jb (jump on below/less than, unsigned). These conditional operations are based on the state of specific bits in the (E)FLAGS register. Many arithmetic and logic operations set, clear or complement these flags depending on their result. The comparison cmp (compare) and test instructions set the flags as if they had performed a subtraction or a bitwise AND operation, respectively, without altering the values of the operands. There are also instructions such as clc (clear carry flag) and cmc (complement carry flag) which work on the flags directly. Floating point comparisons are performed via FCOM or FICOM instructions which eventually have to be converted to integer flags.
Each jump operation has three different forms, depending on the size of the operand. A short jump uses an 8-bit signed operand, which is a relative offset from the current instruction. In real mode or 16-bit protected mode, a near jump uses a 16-bit or unsigned operand as an address relative to the current segment base; in 32-bit protected mode, a near jump is a 16-bit or 32-bit signed relative offset similar to a short jump. A far jump is one that uses the full segment base:offset value as an absolute address. There are also indirect and indexed forms of each of these.
In addition to the simple jump operations, there are the call (call a subroutine) and ret (return from subroutine) instructions. Before transferring control to the subroutine, call pushes the segment offset address of the instruction following the call onto the stack; ret pops this value off the stack, and jumps to it, effectively returning the flow of control to that part of the program. In the case of a far call, the segment base is pushed following the offset.
There are also two similar instructions, int (interrupt), which saves the current register values on the stack, then performs a far call, except that instead of an address, it uses an interrupt vector, an index into a table of interrupt handler addresses. The matching return from interrupt instruction is iret, which restores the register values after returning. Soft Interrupts of the type described above are used by some operating systems for system calls, and can also be used in debugging hard interrupt handlers. Hard interrupts are triggered by external hardware events.
The x86 instruction set architecture includes a mechanism for performing system-wide atomic operations. This ensures that an entire instruction is executed without interruption of sub-operations by other system BUS mastering devices (such as graphics cards, DMA operations or other CPUs). This is an important feature that is used for multiprocessor shared resource critical sections.
The atomic mechanism can be manifest explicitly with an instruction prefix called lock. This prefix can be applied to certain integer instructions which will perform a load, ALU operation, then store within a single instruction. The XCHG instruction implicitly behaves as if the lock prefix were given to it.
This can be seen as most useful for instructions such as CMPXCHG(8B) which performs a load, compare, then depending on the results of the compare, a conditional exchange operation. Another simpler example is the BTS instruction which performs a bit load to the carry flag, then bit set store operation.
This mechanism compares favorably to that used in the PowerPC architecture. The PowerPC uses a "load reserved" instruction which locks the applicable cache line after a successful load operation, but must be unlocked with a subsequent "store reserved" instruction. Each PowerPC can support at most one outstanding "reserved line", so care must be taken that pre-emptions do not interfere with the desired atomic semantics between the execution of the two instructions. For example, if an instruction between the load reserved and store reserved causes a page fault, then the page fault handler would not be able to use the currently busy "load reserved" mechanism itself to ensure its own atomicity while still supporting the original code path's atomicity. (Note that some PowerPC motherboards support system BUS locking, however this is a single global mechanism, and does not address the nested lock problem.)
By isolating lock primitives to single complete (and CISC-like) instructions, the x86 can implement atomic operations even in user-level code in isolation independently from other atomic primitives in the system.
N.B.: The above statements are incorrect: on exceptions such as interrupts, preemptions or page faults, the exception handler must merely kill the existing reservation and can then use reservations for its own locks; all this means is that when you return from the exception, the lock code notices that it did not succeed in locking, because it has lost the reservation, and must try again. There's no "nested locking" mechanism on intel either; if you take a page fault in the lock compare and exchange - loop if failed sequence, it's essentially the same thing. The lock prefix applies solely to the instruction it's prefixed to, that's all.
This article is licensed under the GNU Free Documentation License.
It uses material from the
"X86 assembly language".
Home Page • arts • business • computers • games • health • hospitals • home • kids & teens • news • physicians • recreation• reference • regional • science • shopping • society • sports • world