The Limits of a Single Server

The capacity of a single server to handle user requests is ultimately determined by the limitations of four key resources: network bandwidth, disks, CPU, and memory. When the CPU or Memory limit is reached, requests must wait for their turn to be processed.When these physical limits are pushed to their capacity, this results in queuing. As more requests pile up, waiting times increase, negatively impacting the server’s ability to efficiently handle user requests.

Business Logic and Data Layer Separation

A common approach is to separate an architecture into two parts. The first part is the stateless component responsible for exposing functionality to end users. The second part is the stateful component, which is managed by a database.

This way, most of the application logic executes on the separate server utilizing a separate network, CPU, memory and disk. It makes sure that only a portion of all requests need to reach the database layer.

Partitioning Data

When a software system runs into hardware’s physical limits, the best approach to ensure proper request processing is to divide the data and process it on multiple servers, scaling the data by distributing on multiple servers.

A Look at Failures

When we utilize multiple machines with their own disk drives, network interconnects, processors, and memory units, the likelihood of failures becomes a significant concern. When we have more hard disks, it’s more likely to see hard disk failures. In this context, creating a server requires careful considerations to ensure that the servers start in the correct state and coordinate with other nodes to avoid serving incorrect or stale data.

Replication

Replication plays a crucial role in masking failures and ensuring service availability. If data is replicated on multiple machines, even in the event of failures, clients can connect to a server that holds a copy of the data. However, doing this is not as simple as it sounds.