Scaling Websites to Handle Large Traffic

This article outlines the steps to scale a website for handling increased traffic. For example, if you have a static website hosted on GitHub Pages and experience a rise in visitors, you'll encounter limitations due to the static nature of the pages and the restrictions of GitHub's free hosting platform.

Move to a dynamic server hosting

Dynamic server hosting refers to a hosting solution that allows websites to generate content dynamically in response to user interactions or requests. Unlike static web hosting, where content is pre-built and delivered as-is, dynamic hosting enables real-time generation of pages, often using server-side scripting languages like PHP, Python, Ruby, or Node.js.

Choose a scalable backend infrastructure

Set up a backend server using a framework like Node.js, Django, Ruby on Rails, or Laravel. This allows you to handle user interactions, database queries, authentication, and media management. Begin with a single server but design the system to scale horizontally. As your traffic grows, you'll add more servers to balance the load.

Database scaling

Move from files to databases. Choose relational databases like MySQL or PostgreSQL, or NoSQL databases like MongoDB depending on your structure. Database Scaling Techniques:

Sharding: Distribute data across multiple databases to improve read/write performance.
Replication: Set up master-slave replication to improve read scalability by distributing read queries among multiple database instances.
Horizontal scaling: Implement database clusters to scale as the data grows, especially for handling high traffic.

Load balancing

A single server will not suffice for high traffic. Use load balancing to distribute incoming requests across multiple servers. Popular load balancing techniques include:

Round-robin: Distributes requests evenly across servers.
Least connections: Sends requests to the server with the fewest active connections.
Load Balancer solutions: Use services like AWS Elastic Load Balancer (ELB), HAProxy, or Nginx.

Software-based load balancing

Software-based load balancing is the process of distributing incoming network traffic across multiple servers using software tools or services. It operates at the application level and is commonly used in web hosting, cloud computing, and scalable application infrastructures.

A load balancer runs as a service (or inside a container or VM) and listens for client requests. When a request is received, the load balancer decides which backend server to forward the request to based on predefined algorithms. The backend servers process the request and send the response back to the client via the load balancer.

Load balancing improves application availability, enables horizontal scaling of services, provides fault tolerance by rerouting traffic if a server goes down, supports health checks to ensure traffic is only sent to healthy servers

Common software load balancers are Nginx, HAProxy, Apache HTTP server (with mod_proxy), Traefik, Envoy proxy.

Popular load balancing methods

There are multiple approach to assign incoming requests amongst multiple servers.

Round robin: Incoming requests are distributed evenly across all servers.
Least connections: Incoming requests go to the server with the fewest active connections.
IP hash: Incoming requests from a specific client IP always go to the same server.
Weighted round robin: Servers receive traffic based on assigned weight values.

Software load balancing with Nginx

This section covers load balancing configuration with Nginx.

Round robin load balancing

The default configuration for Nginx is round-robin. Here's an example code to implement round-robin load balancing with Nginx amongst three servers.

upstream backend {
    server 192.168.1.101;
    server 192.168.1.102;
    server 192.168.1.103;
}

server {
    listen 80;
    location / {
        proxy_pass http://backend;
    }
}

Least connection load balancing

Here's an example code to implement least connection load balancing with Nginx amongst two servers.

upstream backend {
    least_conn;
    server 192.168.1.101;
    server 192.168.1.102;
}

server {
    listen 80;
    location / {
        proxy_pass http://backend;
    }
}

IP hash (sticky connections) load balancing

Here's an example code to implement IP hash (sticky connections) load balancing with Nginx amongst two servers.

upstream backend {
    ip_hash;
    server 192.168.1.101;
    server 192.168.1.102;
}

server {
    listen 80;
    location / {
        proxy_pass http://backend;
    }
}

Weighted round robin load balancing

Here's an example code to implement weighted round robin load balancing with Nginx amongst two servers.

upstream backend {
    server 192.168.1.101 weight=3;
    server 192.168.1.102 weight=1;
}

server {
    listen 80;
    location / {
        proxy_pass http://backend;
    }
}

Failover load balancing

Load balancing in the event of a server failure can be configured as follows

upstream backend {
    server 192.168.1.101;
    server 192.168.1.102 backup;  # Used only if others fail
}

server {
    listen 80;
    location / {
        proxy_pass http://backend;
    }
}

Server down load balancing

Load balancing in the event of a server is down can be configured as follows

upstream backend {
    server 192.168.1.101 max_fails=3 fail_timeout=30s;
    server 192.168.1.102 max_fails=3 fail_timeout=30s;
}

server {
    listen 80;
    location / {
        proxy_pass http://backend;
    }
}

Auto-scaling

If you’re using cloud services, enable auto-scaling to automatically increase or decrease server capacity based on traffic. For example, AWS EC2 Auto Scaling adjusts the number of EC2 instances based on predefined thresholds, ensuring you have enough resources during high demand without over-provisioning.

Implement content delivery network (CDN)

Offload static content: A CDN (e.g., Cloudflare, AWS CloudFront) caches and delivers static content (images, videos, stylesheets) from servers closer to the user, reducing latency and offloading traffic from your main servers.
Dynamic content caching: For frequently accessed dynamic content (such as popular posts), you can cache these responses to reduce server load and improve performance.

Caching for performance

Implement Redis or Memcached to cache frequently requested data, such as user profiles or popular posts. This reduces database load and speeds up response times. Cache database query results for frequently accessed data, reducing the need for repeated expensive database queries.

Distribute traffic with edge computing

Use edge computing to distribute parts of the application to locations closer to users. This can reduce latency and improve performance for users in different regions. AWS Lambda at Edge or Cloudflare Workers allow running code at edge locations to enhance performance and reduce load on the main servers.

Security and rate limiting

With more traffic comes the need for improved security measures. Implement firewalls and rate limiting to protect your servers from malicious traffic or abuse. Use services like AWS Shield or Cloudflare to prevent DDoS attacks.

Monitor server performance and traffic

Set up monitoring tools like Prometheus, Grafana, or AWS CloudWatch to monitor server load, response times, and traffic patterns. Implement alerting systems to notify you if server health or load crosses certain thresholds, enabling quick intervention to prevent downtime.

Author

Anurag Gupta is an M.S. graduate in Electrical and Computer Engineering from Cornell University. He also holds an M.Tech degree in Systems and Control Engineering and a B.Tech degree in Electrical Engineering from the Indian Institute of Technology, Bombay.

Comment

Past Comments

No comments yet. Be the first!