Information Storage

Rather than accessing individual bits in memory, most computers use blocks of eight bits, or bytes. A machine-level program views memory as a very large array of bytes, referred to as virtual memory. Every byte of memory is identified by a unique number, known as its address. Virtual address space is the set of all possible addresses.

Every computer has a word size. A word size indicates the normal size of integers and pointer data. Since a virtual address is encoded by such a word, word size determines the maximum size of the virtual address space. That is, for a machine with a w-bit word size, virtual addresses can range from \(0\) to \(2^{(w-1)}\), giving the program access to at most \(2^w\) bytes. Thus, 32-bit computers have a limited virtual address space of 4GB.

We generally write bit patterns as base-16, or hexadecimal. Hexadecimal (hex) uses digits 0 through 9 along with characters A through F to represent 16 possible values. In C, number constants starting with 0x are interpreted as hexadecimal.

Byte Ordering

In virtually all machines, a multi-byte object is stored as a contiguous sequence of bytes, with the address of the object given by the smallest address of the bytes used. Two common conventions exist for ordering the bytes of an object. Little endian puts the least significant byte first. Big endian puts the most significant byte first.

Integer Storage

Integers may be unsigned or signed. Unsigned integers are nonnegative numbers. Signed integers may be negative, zero, or positive numbers.

Unsigned

Unsigned binary representation encodes every number between \(0\) and \(2^{(w - 1)}\) with a unique w-bit value. …0000 is 0, …0001 is 1, etc. In C, unsigned literals have a u suffix.

Signed

Signed integers are most commonly represented with the two's complement encoding. The two's complement of an N-bit number is defined as its complement with respect to \(2^N\). For example, the three-bit number 010 has the two's complement 110 because \(010 + 110 = 1000\). The two's complement is calculated by inverting the digits and adding one.

In the two's complement encoding, if the binary number 010 encodes the decimal 2, then its two's complement, 110, encodes the inverse -2. In other words, to reverse the sign of any integer in this scheme, you can take the two's complement of its binary representation.

Two's complement encoding can also be understood as interpreting the most significant bit of a word as a negative weight. For example, \(1011\) will give \(-8 + 2 + 1 = -5\).

In the two's complement encoding, The least representable value is \(-2^{(w-1)}\) and the greatest value is \(2^{(w-1)} - 1\) (the \(-1\) accounts for \(0\)).