OSI and TCP/IP network models

OSI Seven-Layer Model#

The OSI seven-layer model has a clear concept and a complete theory, but it is complex and impractical, and some functions are repeated in multiple layers.
Physical Layer: Bottom layer of data transmission, such as network cables; network card standards.
Data Link Layer: Defines the basic format of data, how to transmit it, and how to identify it; such as network card MAC address.
Network Layer: Defines IP addressing and routing functions; such as data forwarding between different devices.
Transport Layer: Basic function of end-to-end data transmission; such as TCP, UDP.
Session Layer: Controls the session capability between applications; such as distributing data from different software to different software.
Presentation Layer: Data format identification, basic compression and encryption functions.
Application Layer: Various application software, including web applications.

TCP/IP Four-Layer Model#

The TCP/IP four-layer model is a widely adopted model, and we can consider it as a simplified version of the OSI seven-layer model, consisting of the following four layers:

Application Layer
Transport Layer
Network Layer
Network Interface Layer

Application Layer#

The application layer is located above the transport layer and provides services for information exchange between applications on two terminal devices. It defines the format of information exchange, and the message will be passed to the next transport layer for transmission. We call the data units exchanged at the application layer messages.
Therefore, the application layer only needs to focus on providing application functions to users, such as HTTP, FTP, Telnet, DNS, SMTP, etc.

Moreover, the application layer works in the user space of the operating system, while the transport layer and below work in the kernel space.

Transport Layer#

The main task of the transport layer is to provide general data transmission services between processes on two terminal devices. The application process uses this service to transmit application layer messages. "General" means that it is not specific to a particular network application, but multiple applications can use the same transport layer service.
Common protocols of the transport layer:

TCP (Transmission Control Protocol): Provides a reliable byte stream transmission service that is connection-oriented.
UDP (User Datagram Protocol): Provides a connectionless, best-effort data transmission service (does not guarantee the reliability of data transmission), simple and efficient.

Network Layer#

The network layer is responsible for providing communication services for different hosts on a packet-switched network. When sending data, the network layer encapsulates the segments or user datagrams generated by the transport layer into packets and sends them. In the TCP/IP architecture, because the network layer uses the IP protocol, the packets are also called IP datagrams, or simply datagrams.
Another task of the network layer is to select the appropriate route so that the packets transmitted by the source host's transport layer can find the destination host through the routers in the network layer.
The Internet is composed of a large number of heterogeneous networks connected to each other through routers. The network layer protocols used by the Internet are connectionless Internet Protocol (IP) and many routing protocols, so the network layer of the Internet is also called the internet layer or IP layer.
Common protocols of the network layer:

IP (Internet Protocol): One of the most important protocols in the TCP/IP protocol suite, mainly used to define the format of data packets, route and address data packets so that they can be propagated across networks and reach the correct destination. Currently, there are two main types of IP protocols, one is the older IPv4, and the other is the newer IPv6. Both protocols are currently in use, but the latter has been proposed to replace the former.
ARP (Address Resolution Protocol): ARP protocol solves the conversion problem between network layer addresses and link layer addresses. Because an IP datagram always needs to know where to go next (the next destination in the physical sense) during transmission, but the IP address is a logical address, and the MAC address is the physical address. The ARP protocol solves some problems of IP address to MAC address conversion.
ICMP (Internet Control Message Protocol): A protocol used to transmit network status and error messages, commonly used for network diagnosis and troubleshooting. For example, the Ping tool uses the ICMP protocol to test network connectivity.
NAT (Network Address Translation): NAT protocol is used in the address translation process from the internal network to the external network. Specifically, within a small subnet (LAN), each host uses the same IP address within the LAN, but outside the LAN, in the wide area network (WAN), a unified IP address is needed to identify the position of the LAN on the entire Internet.
OSPF (Open Shortest Path First): An interior gateway protocol (IGP) and a widely used dynamic routing protocol based on the link-state algorithm. It considers factors such as bandwidth and delay of links to select the best path.
RIP (Routing Information Protocol): An interior gateway protocol (IGP) and a dynamic routing protocol based on the distance-vector algorithm. It uses a fixed hop count as the metric and selects the path with the fewest hops as the best path.
BGP (Border Gateway Protocol): A routing protocol used to exchange Network Layer Reachability Information (NLRI) between routing domains. It has high flexibility and scalability.

Network Interface Layer#

We can consider the network interface layer as a combination of the data link layer and the physical layer.

The data link layer is usually abbreviated as the link layer (data transmission between two hosts is always carried out in segments on a segment-by-segment basis). The data link layer's function is to assemble IP datagrams passed down from the network layer into frames and transmit them on the link between two adjacent nodes. Each frame includes data and necessary control information (such as synchronization information, address information, error control, etc.).
The function of the physical layer is to achieve transparent transmission of bit streams between adjacent computer nodes, as well as to shield the differences between specific transmission media and physical devices as much as possible. The important functions and protocols of the network interface layer are shown in the following figure:

Socket Network Programming Based on TCP/IP Four-Layer Model#

Linux Network Protocol Stack#

From the network protocol stack diagram, you can see:

Applications need to interact with the Socket layer through system calls to exchange data.
Below the Socket layer are the transport layer, network layer, and network interface layer.
The bottom layer is the network card driver and the hardware network card device.

Receiving Process#

The network card is a hardware device in the computer that is responsible for receiving and sending network packets. When the network card receives a network packet, it uses DMA technology to write the network packet to a specified memory address, which is the Ring Buffer, a circular buffer. Then it will notify the operating system that the network packet has arrived.

How does it notify the operating system that the network packet has arrived?

Triggering an interrupt: Whenever the network card receives a network packet, it triggers an interrupt to notify the operating system.
However, this will cause a problem. When the operating system frequently receives network packets, it will frequently trigger interrupts, affecting the efficiency of the operating system.
The solution is to use the NAPI mechanism, which is a hybrid of "interrupt and polling" to receive network packets. The core concept of NAPI is not to read data using interrupts, but to first wake up the data reception service program using interrupts, and then use polling to process the data.

What is the NAPI mechanism?

Therefore, when a network packet arrives, it will be written to a specified memory address using DMA technology. Then the network card will initiate a hardware interrupt to the CPU. When the CPU receives the hardware interrupt request, it calls the registered interrupt handler function based on the interrupt table.
The hardware interrupt handler function will do the following:

It needs to "temporarily mask interrupts" first, indicating that it knows that there is data in memory and tells the network card not to notify the CPU again when it receives another data packet, which can improve efficiency and avoid continuous interrupts to the CPU.
Then, it triggers a "soft interrupt" and then restores the previously masked interrupt.

At this point, the work of the hardware interrupt handler function is completed.

Soft interrupt?

The ksoftirqd kernel thread in the kernel is responsible for handling soft interrupts. When the ksoftirqd kernel thread receives a soft interrupt, it will poll and process the data.
The ksoftirqd thread retrieves a data frame, represented by sk_buff, from the Ring Buffer, which can be processed by the network protocol stack as a network packet.

Network Protocol Stack

First, it enters the network interface layer, where the legality of the packet is checked. If it is illegal, it is discarded; if it is legal, the type of the upper-layer protocol of the packet is determined, such as IPv4 or IPv6. Then the frame header and frame trailer are removed, and it is passed to the network layer.
In the network layer, the IP packet is extracted, and the next step of the network packet is determined, whether it is to be processed by the upper layer or forwarded. When it is confirmed that the network packet is to be sent to the local host, the IP header is checked to see if the upper-layer protocol type is TCP or UDP. Then the IP header is removed, and it is passed to the transport layer.
In the transport layer, the TCP header or UDP header is extracted, and based on the four-tuple "source IP, source port, destination IP, destination port" as the identifier, the corresponding socket is found, and the data is placed in the socket's receive buffer.
Finally, the application layer program calls the Socket interface, "copying" the data from the kernel's Socket receive buffer to the application's buffer, and then wakes up the user process.
At this point, the receiving process of a network packet is complete. You can also see the sending process of a network packet from the left part of the figure, and it is the reverse of the receiving process.

How many times does memory copy occur when sending network data?

The first time is when the system call for sending data is called. The kernel will allocate kernel-level sk_buff memory, copy the user's data to the sk_buff memory, and add it to the send buffer.
The second time, when using the TCP transport protocol, when entering the network layer from the transport layer, each sk_buff will be cloned into a new copy. The cloned sk_buff will be sent to the network layer and released after it is sent, while the original sk_buff will remain in the transport layer. This is to achieve reliable transmission of TCP. The original sk_buff will be released only when the ACK of this data packet is received.
The third time, it only occurs when the sk_buff is larger than the MTU. Additional sk_buffs will be allocated, and the original sk_buff will be copied into multiple smaller sk_buffs.

Ref:
小林 coding
JavaGuide