Chapter 1 Computer System Roaming

Questions#

What are the two elements of information?
Bits + Context.
What is a text file composed of?
Characters defined by the ASCII standard.
What are the four stages from source program to target program?
Preprocessing, Compilation, Assembly, Linking.
What are the four components of system hardware?
Processor, Main Memory, I/O Devices, Bus.
Logically, what does main memory look like?
A linear array of bytes, each byte has a unique address.
What is main memory hardware composed of?
Dynamic Random Access Memory (DRAM).
What is the core of the processor? What does it mainly do?
Program Counter, which stores the address of the next instruction.
Describe the process of how the processor runs in the simplest way?
Execute the instruction pointed to by the PC, then update the PC.
What three components do instruction operations revolve around?
Main memory, Register, and ALU (Arithmetic Logic Unit). Registers store instructions, data, status, etc., and the ALU performs calculations.
What are the four instruction operations: Load, Store, Operate, Jump?
(1) Load: This is the operation of loading data from memory into a register or other internal processor storage.
(2) Store: This is the operation of storing data from a register or other internal processor storage into memory.
(3) Operate (Arithmetic/Logical): This involves performing arithmetic or logical operations within the processor. Arithmetic operations include addition, subtraction, multiplication, division, etc.; logical operations involve bit operations such as AND, OR, NOT, XOR, etc.
(4) Jump: This is the operation used to change the program execution order.
What are instruction set architecture and microarchitecture?
Describes the effects of instructions. The way hardware is implemented.
What is the process of the CPU executing a target file?
Read the file from disk into registers, then place it into main memory (direct access can be achieved without going through registers), and execute the instructions of the target file.
What is the entire process from source program to obtaining the running result?
Source program -> Target program -> Processor executes the target program.
How much faster is main memory compared to disk? How much faster are registers compared to disk?
Ten million times, one hundred times.
Where is L1 cache located? How big is it? How fast is it? What about L2?
On the CPU, tens of MB, five times faster than L2; L2 is connected to the CPU via a special bus, hundreds of MB to several GB, five to ten times faster than main memory.
How is the cache used?
To store data that needs to be accessed frequently.
How many layers are there in the memory hierarchy, and what is the main idea?
Seven layers, using the upper layer as a cache for the lower layer.
What are the three basic abstract concepts of an operating system?
Files, Virtual Memory, Processes.
What is a process? What are its characteristics?
A program that the system is currently executing; at any given moment, the system can only execute one process.
What is the operating system kernel? What does it mainly do?
The part of the operating system that resides in main memory, responsible for managing process switching, through context switching.
What is virtual address space? What is it composed of?
What the process sees is a virtual address space. From low to high, it consists of program code and data, runtime heap, shared libraries, user stack, and kernel virtual memory.
What is the difference between concurrency and parallelism, and how does it differ from sequential execution?
Concurrency is interleaved execution, parallelism is simultaneous execution, and sequential execution completes one before executing another.
What is thread-level parallelism?
Four cores with one thread per core result in four-thread parallelism.
What is hyper-threading?
Hyper-threading allows one core to execute two threads. Because hardware like PCs and registers have multiple backups, another thread can be executed while one thread is waiting for data to be copied. It effectively utilizes hardware to interleave the execution of different threads.
What is instruction-level parallelism?
An instruction typically requires over 20 clock cycles; with parallelism, 2 to 4 instructions can be executed in one second.
What is single instruction, multiple data parallelism?
Several simple operations of a single instruction are executed in parallel (hardware level).
What is instruction set architecture?
An abstraction that appears to execute instructions sequentially, but in reality, it is instruction-level parallelism behind the scenes.

1.1 Information is Bits + Context#

Program#

The lifecycle of a program begins with the source program (source file). The source program is actually a sequence of bits composed of 0s and 1s.
Generally represented by the ASCII standard for text characters, using an integer value of one byte to represent a character.
Each line of text in the source file ends with an invisible '\n'.
Files composed solely of ASCII characters are called text files; others are binary files. For example, .cpp files are text files.
All information in the system is represented by a string of bits (bit: 位), and the only way to distinguish different data objects is based on context.

Characteristics of C Language#

C language is small and simple.
C language was designed to implement Unix.
C language is closely related to Unix.

C language is the preferred choice for system-level programming and is also very suitable for application-level programs.

1.2 Programs are Translated into Different Formats by Other Programs#

    #include <stdio.h>
    int main(){
        printf("hello, world!\n");
        return 0;
    }

Four Steps from Source Program to Target Program:#

The source program is processed by the preprocessor to obtain the modified source program (text file, hello.i).
Then, it is processed by the compiler to obtain the assembly program (text file, hello.s).
The assembly program is processed by the assembler to obtain the relocatable target program (binary file, hello.o).
Finally, the linker links to obtain the executable target program (binary file, hello).

Preprocessing Stage: The preprocessor (cpp) modifies the original C program based on commands starting with the character #. For example, the command #include <stdio.h> in the first line of hello.c tells the preprocessor to read the content of the header file stdio.h and insert it directly into the program text. The result is another C program, usually with a .i file extension.
Compilation Stage: The compiler (ccl) translates the text file hello.i into the text file hello.s, which contains an assembly language program. This program includes the definition of the function main, as shown below:

    1   main:
    2     subq  $8,  %rsp
    3     movl  $.LC0,  %edi
    4     call  puts
    5     movl  $0,  %eax
    6     addq  $8,  %rsp
    7     ret

Assembly Stage: Next, the assembler (as) translates hello.s into machine language instructions, packaging these instructions into a format called relocatable object program, and saving the result in the target file hello.o. The hello.o file is a binary file that contains 17 bytes of instruction encoding for the function main.
Linking Stage: It should be noted that the hello program calls the printf function, which is a function provided by the standard C library of every C compiler. The printf function exists in a separate precompiled target file named printf.o. The linker (ld) is responsible for merging this file into our hello.o file, ultimately obtaining an executable target file hello that can be loaded into memory and executed by the system.

1.3 Understanding How the Compilation System Works is Very Useful#

Uses#

Optimize program performance.
Understand errors that occur during linking.
Avoid security vulnerabilities.

1.4 The Processor Reads and Interprets Instructions Stored in Memory#

The shell is a command line interpreter that outputs a prompt (>>), waiting for a command line input, then executes the command. If the input is the name of an executable file, it executes that file.

1.4.1 Components of the System Hardware#

Mainly includes Bus, I/O Devices, Processor, Main Memory.

Bus#

The bus can transfer a fixed-length byte block at a time, called a word. In a 64-bit system, the bus can transfer 64 bits (8 bytes) at a time, where a word is 8 bytes.

I/O Devices#

Each I/O device is connected to the I/O bus through a controller or adapter.
The controller is either the I/O device itself or a chipset on the motherboard, while the adapter is a card inserted into the motherboard.

Main Memory#

Main memory consists of a set of Dynamic Random Access Memory (DRAM).
Logically, memory is a linear array of bytes, each byte has a unique address.

Processor#

The processor is the engine that interprets instructions stored in main memory.
The core of the processor is a Program Counter (PC).
The program counter is a storage device of size one word, which stores the address of the next instruction that the CPU is about to execute.
The processor continuously executes the instruction pointed to by the program counter. After executing each instruction, the program counter updates to point to the next instruction.
The processor interprets the bits in the instruction according to the instruction execution model (instruction set architecture) and performs the corresponding operations.
The operations of each instruction revolve around main memory, register file, and arithmetic logic unit (ALU).

Register File#

Single word length, with a unique name.

ALU#

Operations of simple instructions:
- Load: Copy a word or byte from main memory to a register, overwriting the original content.
- Store: Copy a word or byte from a register to main memory, overwriting the original content.
- Operate: Copy the contents of two registers to the ALU, which performs arithmetic operations on these two words and stores the result in a register.
- Jump: Extract a word from the instruction and copy it to the program counter, overwriting the original content.

Distinguishing between instruction set architecture and microarchitecture:

Instruction Set Architecture: The effect of each machine instruction.
Microarchitecture: How the processor is actually implemented.

1.4.2 Running the Hello Program#

When executing the target file, the shell program reads the characters in the target file from the disk one by one into registers, then places them into main memory. After that, the processor begins executing the machine language instructions of the target file, starting from the main program. Using Direct Memory Access (DMA), data can be read directly from the disk into memory without going through registers.
The entire process: Read file characters into registers -> Store into main memory -> Execute instructions -> Load hello world into registers -> Copy to display -> Show

1.5 Cache is Crucial#

Reading a word from main memory is 10 million times faster than from disk.
Reading from the register file is 100 times faster than from main memory.
Cache is used to address the disparity between the processor and main memory.
L1 Cache is located on the CPU, with a capacity of tens of thousands of bytes (tens of MB). L1 is five times faster than L2.
L2 Cache is connected to the CPU via a special bus, with a capacity ranging from hundreds of thousands to millions of bytes (hundreds of MB to several GB). L2 is 5 to 10 times faster than main memory.
By allowing the cache to store frequently accessed data, most memory operations can be completed in the cache.

1.6 Storage Devices Form a Hierarchical Structure#

The memory structure consists of 7 layers, with the main idea being the storage of the upper layer serves as a cache for the lower layer.
From top to bottom, the capacity increases, the speed decreases, and the cost per byte becomes cheaper.
- Level 0: Registers
- Level 1: L1 Cache (SRAM)
- Level 2: L2 Cache (SRAM)
- Level 3: L3 Cache (SRAM)
- Level 4: Main Memory (DRAM)
- Level 5: Local Secondary Storage (Local Disk)
- Level 6: Remote Secondary Storage (Distributed File System, Web Server)

1.7 The Operating System Manages Hardware#

The two basic functions of the operating system:
1. Prevent hardware from being abused by uncontrolled applications.
2. Provide applications with a simple and consistent mechanism to control complex low-level hardware devices.
The three basic abstract concepts applied by the operating system:
1. Process: An abstract representation of the processor, main memory, and I/O devices.
2. Virtual Memory: An abstract representation of memory and disk.
3. File: An abstract representation of I/O devices.

1.7.1 Process#

Process: An abstraction of the program currently running in the operating system.
Concurrent Execution: The instructions of one process and another process are executed in an interleaved manner. A system can run multiple processes simultaneously, and these processes are actually running concurrently.
The operating system achieves concurrent execution through context switching. The context is all the state information needed to track the execution of a process, which may exist in the PC, register file, main memory, etc.
At any moment, a single processor can only execute the code of one process.
The operating system kernel is the part of the operating system code that resides in memory, and the transition from one process to another is managed by the kernel.
The kernel is not an independent process, but a collection of code and data structures.
When an application needs certain operations from the operating system, it passes control to the kernel, which completes the operation and returns to the application.

1.7.2 Thread#

A process consists of multiple threads, each thread runs in the context of the process, sharing the same code and global data.
Sharing data between multiple threads is easier than between multiple processes, and threads are generally more efficient than processes.

1.7.3 Virtual Memory#

Machine programs view memory as a large array of bytes, referred to as virtual memory.
Each byte of memory is identified by an address, and the set of all possible addresses is virtual address space.
Virtual memory makes each process think it has exclusive access to main memory. The memory seen by each process is consistent, that is, virtual address space.
In Linux, the virtual address space seen by each process consists of the following parts:
1. Program Code and Data
2. Heap (Runtime Heap)
3. Shared Libraries
4. Stack (User Stack)
5. Kernel Virtual Memory
Addresses increase from low to high, with the highest layer Kernel Virtual Memory containing the code and data of the operating system, which is the same for every process.
Program Code and Data: For all processes, the code starts from the same fixed address, followed by the data area corresponding to global variables. Both the code and data areas are initialized according to the contents of the executable file. The sizes of the code and data areas are specified when the process starts running.
Heap: The runtime heap is dynamically expanded and contracted at runtime based on calls to malloc and free functions.
Shared Libraries: The middle part of the address space is used to store the code and data of shared libraries, such as the C standard library, math library, etc.
Stack: Like the heap, the user stack can dynamically expand and contract during program execution, and the compiler uses it to implement function calls. When a function is called, the stack grows, and when returning from the function, the stack shrinks.

1.7.4 File#

A file is simply a sequence of bytes, nothing more.
Each I/O device, including disks, keyboards, displays, and networks, can be viewed as a network.

1.8 Systems Communicate Over Networks#

From the perspective of a single system, the network can be viewed as an I/O device.
For example, running a program on a remote server involves inputting locally, executing remotely, and sending the execution results back to local output.

1.9 Main Themes#

1.9.1 Amdahl's Law#

The main point of Amdahl's Law: To accelerate the entire system, it is necessary to improve the relatively large parts of the whole system.

1.9.2 Concurrency and Parallelism#

Distinguishing between concurrency and parallelism:
- Concurrency: A general concept referring to a system that has multiple activities simultaneously.
- Parallelism: Using concurrency to make the system run faster.
Parallelism can be applied at multiple abstract levels. From high to low, there are the following three levels:

Thread-Level Parallelism
- Traditional concurrency is simulated by rapidly switching between processes on a single processor.
- Multi-processor systems are controlled by an operating system managing multiple CPUs. The structure is as follows:

Cache

L1 cache is divided into two parts: one stores the most recently fetched instructions, and the other stores data.
Hyper-threading, also known as simultaneous multithreading, allows one CPU to execute multiple control flows. Some hardware, like program counters and register files, have multiple backups, while other hardware only has one, such as floating-point arithmetic units. Conventional CPUs require about 20,000 clock cycles to switch threads, while hyper-threading CPUs can switch threads on the basis of a single cycle, for example, when one thread is waiting for data to load into the cache, the CPU can execute another thread.
The i7 processor executes two threads per core, resulting in 4 cores and 8 threads, with all 8 threads executing in parallel.

Instruction-Level Parallelism
- Each instruction generally requires 20 or more clock cycles from start to finish; through instruction-level parallelism, a rate of 2 to 4 instructions can be executed per cycle.
- If it is faster than one instruction per cycle, it is called a superscalar processor, which is now generally the case.
Single Instruction, Multiple Data Parallelism
- At the lowest level, modern processors allow one instruction to produce multiple operations that can be executed in parallel, known as single instruction, multiple data parallelism, or SIMD parallelism.

1.9.3 The Importance of Abstraction in Computer Systems#

Abstraction of Operating Systems

Instruction Set Architecture is an abstraction of CPU hardware, where the CPU appears to execute one machine code instruction at a time, but in reality, the underlying hardware executes multiple instructions in parallel.
Virtual machines are an abstraction of the entire computer system, including the operating system, processor, and programs.