Processes, Threads, and Coroutines#
What are Processes, Threads, and Coroutines#
Process | Thread | Coroutine | |
---|---|---|---|
Definition | The basic unit of resource allocation and ownership | The basic unit of program execution | A lightweight thread in user space, the basic unit of scheduling within a thread |
Switching Situation | Saving the CPU environment of the process (stack, registers, page tables, file handles, etc.) and setting up the CPU environment of the newly scheduled process | Saving and setting the program counter, a small number of registers, and stack contents | First saving the register context and stack, then restoring them when switching back |
Switching Process | User mode -> Kernel mode -> User mode | User mode -> Kernel mode -> User mode | User mode (without entering the kernel) |
Owned Resources | CPU resources, memory resources, file resources, and handles | Program counter, registers, stack, and status word | Own register context and stack |
Concurrency | Switching between different processes achieves concurrency, each occupying the CPU for parallelism | Multiple threads within a process execute concurrently | Only one coroutine can execute at a time, while other coroutines are in a sleeping state, suitable for time-sharing processing of tasks |
System Overhead | Switching virtual address spaces, switching kernel stacks and hardware contexts, CPU cache invalidation, page table switching, high overhead | Only a small number of register contents need to be saved and set during switching, so overhead is low | Directly operating the stack incurs almost no kernel switching overhead, allowing lock-free access to global variables, making context switching very fast |
Communication | Inter-process communication requires the help of the operating system | Threads can directly read and write the process data segment (such as global variables) for communication |
Switching Issues of Processes, Threads, and Coroutines#
- Process Switching: Process switching is managed and scheduled by the operating system. When the operating system decides to switch to another process, it will save the context of the current process (including register values, program counter, etc.) and load the context of the next process. This process involves switching memory mappings, registers, and other resources, making the cost of switching processes relatively high. Process switching requires intervention from the operating system, so the overhead is relatively large.
- Thread Switching: Thread switching occurs within the same process and is scheduled by the operating system. Thread switching also requires saving and restoring register values, program counters, and other context information, but since threads share the process's address space and resources, the switching cost is relatively low. Thread switching still requires intervention from the operating system, but the overhead is much smaller than that of process switching.
Performance Overhead: Switching kernel stacks, switching hardware contexts, saving contents in registers, saving the state of the previously executed flow, CPU cache invalidation. Page table lookup is a slow process, so a cache is usually used to cache frequently used address mappings, which speeds up page table lookups; this cache is the TLB. After a process switch, the page table also needs to be switched, and after the page table switch, the TLB becomes invalid, leading to a decrease in hit rate, which slows down the conversion of virtual addresses to physical addresses, resulting in slower program execution. - Coroutine Switching: Coroutine switching occurs at the application level and is explicitly managed by the coroutine library or the programmer. Coroutine switching does not require intervention from the operating system, so the switching cost is very low. Coroutine switching typically only involves saving and restoring a small amount of context information, such as stack pointers and local variables. Since coroutine switching is cooperative, control must be explicitly transferred to other coroutines, allowing for better control of the overhead of coroutine switching.
Performance Overhead:
Saving the CPU register state of the current coroutine and then loading the CPU register state of the coroutine to be switched in is sufficient. Moreover, it is done entirely in user mode; generally, a coroutine context switch takes at most tens of nanoseconds.
Coroutine switching is faster than thread switching mainly for two reasons:
- Coroutine switching occurs entirely in user space; thread switching involves switching to privileged mode, requiring completion in kernel space;
- Coroutine switching does less work compared to thread switching; threads require switching between kernel and user modes, as well as system call processes.
Overall, process switching has the highest overhead, followed by thread switching, while coroutine switching has the lowest overhead. Coroutines can manage switching more flexibly at the application level to improve efficiency and concurrency performance. However, it is important to manage switching points reasonably in coroutines to avoid excessive switching that may incur additional overhead.
How many threads can a process create at most?
In a 32-bit system, the user-mode virtual space is only 3G. If the stack space allocated when creating a thread is 10M, then a process can create at most around 300 threads.
In a 64-bit system, the user-mode virtual space is large enough to be 128T, theoretically not limited by the size of virtual memory, but by system parameters or performance limits.
Thread Crash, Process Crash
Generally, if a thread crashes due to illegal memory access, then the process will definitely crash. Why does the system allow the process to crash? This is mainly because in a process, the address space of each thread is shared. Since it is shared, illegal access to an address by one thread can lead to memory uncertainty, which may affect other threads. Such operations are dangerous, and the operating system believes this could lead to a series of serious consequences, so it simply lets the entire process crash.
How does a process crash? The mechanism behind this is signals - kill (SIGSEGV).
Why does a thread crash not lead to a JVM process crash?
Because the JVM has defined its own signal handling function, intercepting the SIGSEGV signal to prevent these two from crashing.
What are the communication methods?#
What are the methods of inter-process communication?#
- Pipes/Anonymous Pipes: Used for communication between related parent and child processes or sibling processes.
- Named Pipes: Anonymous pipes do not have names and can only be used for communication between related processes. To overcome this limitation, named pipes were introduced. Named pipes strictly follow First In First Out (FIFO). Named pipes exist as disk files and can facilitate communication between any two processes on the same machine.
- Signals: Signals are a more complex communication method used to notify the receiving process that a certain event has occurred.
- Message Queuing: A message queue is a linked list of messages, stored in memory with a specific format and identified by a message queue identifier. The communication data for pipes and message queues follows the FIFO principle. Unlike pipes (anonymous pipes: files that only exist in memory; named pipes: exist on actual disk media or file systems), message queues are stored in the kernel and will only be truly deleted when the kernel is restarted (i.e., the operating system is restarted) or when a message queue is explicitly deleted. Message queues allow for random querying of messages; messages do not necessarily have to be read in FIFO order and can also be read by message type, providing advantages over FIFO. Message queues overcome the limitations of signals carrying little information and pipes being limited by buffer size.
- Semaphores: A semaphore is a counter used for multi-process access to shared data, intended for inter-process synchronization. This communication method is mainly used to solve synchronization-related issues and avoid race conditions.
- Shared Memory: Allows multiple processes to access the same memory space, enabling different processes to see updates to shared memory data in real-time. This method requires some form of synchronization, such as mutexes and semaphores. It can be said to be the most useful method of inter-process communication.
- Sockets: This method is mainly used for communication between clients and servers over a network. A socket is the basic unit of network communication supporting TCP/IP, serving as an endpoint for bidirectional communication between processes on different hosts, simply put, it is an agreement between the two parties communicating, using related functions in the socket to complete the communication process.
Communication between multiple threads?#
Since multiple threads share the resources of a process, communication between threads focuses more on how to handle conflicts in resource usage.
Mutual Exclusion and Synchronization#
In a single-core CPU system, to create the illusion of multiple programs running simultaneously, the operating system typically uses time-slicing scheduling, allowing each process to execute for one time slice at a time. Once the time slice is used up, the next process runs. Due to the short duration of this time slice, a "concurrent" phenomenon occurs. If multiple threads compete for shared resources without effective measures, it can lead to confusion in shared data.
Mutual Exclusion
The situation described above is called a race condition. When multiple threads compete to operate on shared variables, due to bad luck, i.e., a context switch occurring during execution, we get incorrect results. In fact, each run may yield different results, leading to uncertainty in the output.
Since the code segment that multiple threads execute to operate on shared variables may lead to a race condition, we refer to this segment of code as a critical section; it is a code snippet that accesses shared resources and must not be executed simultaneously by multiple threads.
We want this code to be mutually exclusive, meaning that when one thread is executing in the critical section, other threads should be prevented from entering the critical section. In simple terms, at most one thread can be present during the execution of this code.
Synchronization
Mutual exclusion solves the problem of concurrent processes/threads accessing the critical section. This interaction based on critical sections is relatively simple; as soon as one process/thread enters the critical section, other processes/threads attempting to enter the critical section will be blocked until the first process/thread leaves the critical section.
We know that in multithreading, each thread does not necessarily execute in order; they progress independently and unpredictably. However, sometimes we want multiple threads to closely cooperate to achieve a common task.
Synchronization refers to the situation where concurrent processes/threads may need to wait for each other and exchange messages at certain critical points. This mutual restraint of waiting and exchanging information is called process/thread synchronization.
Implementation and Use of Mutual Exclusion and Synchronization#
Locks
Using lock and unlock operations can solve the mutual exclusion problem of concurrent threads/processes.
Any thread wishing to enter the critical section must first perform the lock operation. If the lock operation is successful, the thread can enter the critical section; after accessing the critical resource, it performs the unlock operation to release that critical resource.
Depending on the implementation of the lock, it can be divided into "busy-wait locks" and "non-busy-wait locks."
Busy-Wait Lock
A "busy-wait lock" is based on atomic operation instructions - Test-and-Set instructions. When the lock cannot be obtained, the thread will continuously loop without doing anything, hence it is called a "busy-wait lock," also known as a spin lock.
Non-Busy-Wait Lock
When the lock cannot be obtained, the current thread is placed in the lock's waiting queue, and the scheduler is executed to give the CPU to other threads.
Semaphore
A semaphore is a method provided by the operating system to coordinate access to shared resources.
Typically, a semaphore represents the number of resources, and the corresponding variable is an integer (sem) variable.
Additionally, there are two atomic operation system call functions to control the semaphore, which are:
- P operation: Decreases sem by 1; if sem < 0 after subtraction, the process/thread enters a blocked waiting state; otherwise, it continues, indicating that the P operation may block;
- V operation: Increases sem by 1; if sem <= 0 after addition, it wakes up a waiting process/thread, indicating that the V operation will not block;
The P operation is used before entering the critical section, and the V operation is used after leaving the critical section; these two operations must appear in pairs.
How to use semaphores to achieve mutual access to the critical section
Set a semaphore s for each type of shared resource, with an initial value of 1, indicating that the critical resource is unoccupied.
As long as the operation to enter the critical section is placed between P(s) and V(s), mutual exclusion of processes/threads can be achieved:
At this point, any thread wishing to enter the critical section must first perform the P operation on the mutual semaphore, and after accessing the critical resource, it performs the V operation. Since the initial value of the mutual semaphore is 1, after the first thread performs the P operation, the value of s becomes 0, indicating that the critical resource is free and can be allocated to that thread, allowing it to enter the critical section.
If a second thread wishes to enter the critical section at this time, it must also perform the P operation, resulting in s becoming negative, which means the critical resource is occupied, and thus the second thread is blocked.
Moreover, only after the first thread performs the V operation to release the critical resource and restore the value of s to 0 will the second thread be awakened, allowing it to enter the critical section. Once it completes its access to the critical resource, it performs the V operation, restoring s to its initial value of 1.
For two concurrent threads, the value of the mutual semaphore can only be 1, 0, or -1, representing:
- If the mutual semaphore is 1, it indicates that no thread is in the critical section;
- If the mutual semaphore is 0, it indicates that one thread is in the critical section;
- If the mutual semaphore is -1, it indicates that one thread is in the critical section while another thread is waiting to enter.
Using the mutual semaphore method ensures that only one thread executes in the critical section at any time, achieving mutual exclusion.
How to use semaphores to achieve event synchronization
The synchronization method is to set a semaphore with an initial value of 0. Using the "eating-cooking" synchronization example:
The mother initially asks her son whether he wants to eat, executing P(s1), which is equivalent to asking her son if he needs to eat. Since s1's initial value is 0, it becomes -1, indicating that the son does not need to eat, so the mother thread enters a waiting state.
When the son gets hungry, he executes V(s1), changing the s1 semaphore from -1 to 0, indicating that the son needs to eat, thus waking up the blocked mother thread, who then starts cooking.
Next, the son thread executes P(s2), which is equivalent to asking the mother if the food is ready. Since s2's initial value is 0, it becomes -1, indicating that the mother has not finished cooking, and the son thread enters a waiting state.
Finally, when the mother finishes cooking, she executes V(s2), changing the s2 semaphore from -1 back to 0, waking up the waiting son thread, who can then proceed to eat.
Classic Synchronization Problems#
Producer-Consumer Problem#
The producer-consumer problem is described as follows:
- The producer generates data and places it in a buffer;
- The consumer retrieves data from the buffer for processing;
- At any time, only one producer or consumer can access the buffer;
Analyzing the problem leads to the conclusion:
- At any time, only one thread can operate on the buffer, indicating that operations on the buffer are critical code that requires mutual exclusion;
- When the buffer is empty, the consumer must wait for the producer to generate data; when the buffer is full, the producer must wait for the consumer to retrieve data, indicating that the producer and consumer need synchronization.
Thus, we need three semaphores:
- Mutual semaphore mutex: Used for mutual access to the buffer, initialized to 1;
- Resource semaphore fullBuffers: Used for the consumer to inquire whether there is data in the buffer; initialized to 0 (indicating the buffer is initially empty);
- Resource semaphore emptyBuffers: Used for the producer to inquire whether there is space in the buffer; initialized to n (the size of the buffer).
Dining Philosophers Problem#
Let's first look at the description of the dining philosophers problem:
- 5 philosophers are idly sitting around a round table eating noodles;
- Coincidentally, there are only 5 forks on the table, one fork placed between each pair of philosophers;
- The philosophers think first, and when they get hungry, they want to eat;
- Oddly, these philosophers require two forks to eat, meaning they need to pick up the forks on both sides before they can eat;
- After eating, they will put the two forks back and continue thinking;
An array state is used to record the three states of each philosopher: eating state, thinking state, and hungry state (trying to pick up forks).
A philosopher can only enter the eating state when both neighbors are not eating.
The left and right neighbors of philosopher i are defined by macros LEFT and RIGHT:
- LEFT : ( i + 5 - 1 ) % 5
- RIGHT : ( i + 1 ) % 5
For example, if i is 2, then LEFT is 1 and RIGHT is 3.
Reader-Writer Problem#
The reader-writer problem is described as follows:
- "Read-Read" allowed: At the same time, multiple readers are allowed to read;
- "Read-Write" mutual exclusion: Readers can only read when there are no writers, and writers can only write when there are no readers;
- "Write-Write" mutual exclusion: Writers can only write when there are no other writers.
Fairness strategy:
- Equal priority;
- Mutual exclusion access for writers and readers;
- Only one writer can access the critical section;
- Multiple readers can access the critical resource simultaneously.
Deadlock#
Conditions for Deadlock#
Deadlock can only occur when all of the following four conditions are met:
- Mutual exclusion condition;
- Hold and wait condition;
- No preemption condition;
- Circular wait condition;
How to Diagnose Deadlock#
Java - jstack
C++ - Valgrind
Python - Third-party libraries deadlock-detector and deadlocks
Go - go tool trace
How to Avoid Deadlock#
Using the resource ordering allocation method to break the circular wait condition.
So what is the resource ordering allocation method?
Threads A and B must acquire resources in the same order. When thread A first attempts to acquire resource A and then resource B, thread B must also first attempt to acquire resource A and then resource B. In other words, threads A and B always request their desired resources in the same order.
We can modify the previously deadlocked code using the resource ordering allocation method without changing thread A's code.
We first need to clarify the order in which thread A acquires resources; it first acquires mutex lock A and then mutex lock B.
Thus, we only need to change thread B to acquire resources in the same order to break the deadlock.
Optimistic Lock and Pessimistic Lock#
Pessimistic locking assumes that the probability of multiple threads modifying shared resources simultaneously is high, so conflicts are likely to occur. Therefore, before accessing shared resources, a lock must be acquired.
Optimistic locking assumes that the probability of conflict is low. Its working method is: first modify the shared resource, then verify whether a conflict has occurred during that time. If no other thread has modified the resource, the operation is completed. If it is found that another thread has already modified the resource, the current operation is abandoned.
Scenario example: Online documents:
To implement simultaneous editing by multiple users, optimistic locking is used. It allows multiple users to open the same document for editing. After editing, it verifies whether the modified content has conflicts.
How is a conflict defined? For example, if user A first edits a document in the browser, and then user B also opens the same document in the browser for editing, but user B submits earlier than user A, user A is unaware of this process. When A submits the modified content, conflicts will occur in the areas modified in parallel between A and B.
How does the server verify whether a conflict has occurred? The typical solution is as follows:
Since the probability of conflict is low, users are allowed to edit the document first, but the browser records the document version number returned by the server when downloading the document;
When the user submits modifications, the request sent to the server will include the original document version number. The server will compare it with the current version number upon receipt. If the version numbers do not match, the submission fails; if the version numbers match, the modification is successful, and the server updates the version number to the latest.
In fact, the SVN and Git we commonly encounter also use the idea of optimistic locking, allowing users to edit code first and then determining whether conflicts have occurred through version numbers when submitting. In the event of a conflict, we need to modify it ourselves and then resubmit.
Although optimistic locking eliminates the need for lock and unlock operations, once a conflict occurs, the cost of retrying is very high. Therefore, it should only be considered in scenarios where the probability of conflict is very low and the cost of locking is very high.