4 min read

Part 2. L4 Load Balancer bind/unbind and connection lifecycle

We dissect what connection state changes actually occur in L4 Load Balancer by bind, unbind, and connection draining based on the TCP lifecycle.

Series: Graceful Drain 완벽 가이드

7편 구성. 현재 2편을 보고 있습니다.

Part 2. L4 Load Balancer bind/unbind and connection lifecycle

To properly apply L4 Load Balancer, Connection Draining, Graceful Shutdown, and Zero Downtime Deployment in practice, you must first understand what bind/unbind changes at the packet level. This article organizes the connection life cycle based on the L4 -> Nginx -> App -> WebSocket structure.

Based on version

  • Linux Kernel 5.15+
  • Nginx 1.25+
  • HAProxy 2.8+ or equivalent L4 equipment -JVM 21

1. What bind/unbind actually changes

Key takeaways

  • bind registers the server as a “backend candidate capable of receiving new connections.”
  • unbind blocks “only new connections” and can maintain existing connections or force them to be disconnected.
  • What is important in operation is not the unbind itself, but the policy after unbinding (drain vs hard close).

Detailed description

From an L4 perspective, bind/unbind is a change in the set of routing destinations.

  • bind: Add VM to hash/round robin target pool
  • unbind: Remove VM from target pool
  • drain: Stop assigning new TCP 3-way handshake (SYN) to the VM targeted for removal.

In other words, unbind is a control plane operation, and the actual existence of a failure is determined by how the existing session is handled in the data plane.

  • drain on: Existing sessions wait for FIN-based normal termination
  • drain off or force shutdown: Increases the likelihood of RST occurring

Practical tips

  • Maintenance conversion is always fixed in the order unbind -> drain -> terminate.
  • During drain, simultaneously check whether new connections is 0 and active connections is decreasing.
  • Force the drain status check API into the deployment pipeline.

Common Mistakes

  • Call VM termination immediately after unbinding.
  • It is mistakenly judged as drain completion just because the health check fails.
  • The drain option name for each equipment is different, so the operation runbook is different.

2. Connection lifecycle: From SYN to FIN

Key takeaways

  • Normal termination is based on FIN, and fault termination is mainly indicated by RST or timeout.
  • In L4, Connection Draining is ultimately "block new SYN + wait for purge of existing ESTABLISHED".

Detailed description

Connection flow for a typical request:

  1. Client -> L4: SYN
  2. L4 -> Nginx (backend VM): Forward SYN
  3. After completing the 3-way handshake ESTABLISHED
  4. Data exchange (HTTP keepalive or WebSocket)
  5. At the end FIN -> ACK -> FIN -> ACK

The problem comes after unbinding.

  • The new SYN goes to another VM,
  • The existing ESTABLISHED remains in the existing VM.
  • If the VM is forcibly shut down, it terminates with RST/timeout instead of FIN, resulting in an explosion of client errors.

Practical tips

  • Drain observation indicators are grouped into 3 axes: new, active, and reset.
  • Don’t just look at the L4 log, but also look at ESTABLISHED/CLOSE_WAIT with ss -ant in the VM.

Common Mistakes

  • Only look at the HTTP request success rate and miss the increase in TCP reset.
  • Underestimate drain time by assuming keepalive connections as short-lived requests.
# 서버별 연결 상태 확인
ss -ant | awk 'NR==1 || /:443/'

# reset 패턴 추적(커널 counters)
netstat -s | egrep -i 'reset|failed|retrans'

3. Architecture diagram: bind/unbind lifecycle

Key takeaways

  • The key turning point is BOUND -> DRAINING -> UNBOUND state movement.
  • In this section, the application must prepare Graceful Shutdown.

Detailed description

Mermaid diagram rendering...
[Before]
Client -> L4 -> VM-A (new + existing)
Client -> L4 -> VM-B (new + existing)

[After unbind VM-A + drain]
Client -> L4 -X-> VM-A (new blocked)
Client -> L4 ---> VM-B (new accepted)
Existing VM-A connections remain until FIN/timeout
ClientL4 Load BalancerVM-A (Draining)VM-B (Bound)new blockednew accepted

Practical tips

  • Standardizing LB status as BOUND, DRAINING, DETACHED in internal documents reduces communication errors between operators.

Common Mistakes

  • Only records the drain state abstractly and does not leave transition events (time/indicators).

4. L4 vs L7 Load Balancer Selection Criteria

Key takeaways

  • L4 is strong in TCP session stability and performance, and L7 is strong in request unit control.
  • If there are many WebSocket long-lived connections, the L4-centric design is simple, but the drain policy must be stricter.

Detailed description

ItemL4 Load BalancerL7 Load Balancer
control unitTCP connectionHTTP request/stream
AdvantagesLow overhead, high throughputRouting/Header/Cookie Based Control
Weaknessesrequest level policy limitsProxy costs increase
WebSocketsBeneficial for maintaining sessionsUpgrade Processing Implementation Quality Matters
Types of DisabilitiesRST/timeout centered5xx/timeout centered

Practical tips

  • WebSocket Load Balancing generally has an L4-only or L4+Nginx (L7) mixed structure.
  • Zero Downtime Deployment is possible only when draining from L4 and performing Graceful Shutdown together in Nginx/App.

Common Mistakes

  • Increases delay by adding excessive layers even though the L7 function is not needed.
  • Because it is L4, it is considered safe and app-level graceful processing is omitted.

Operational Checklist

  • Did you switch the target VM to unbind + drain before starting maintenance?
  • Is new connections=0 confirmed during drain?
  • Doesn’t reset count increase rapidly compared to usual?
  • Have you confirmed the changes in ESTABLISHED, CLOSE_WAIT, and TIME_WAIT in the VM?
  • Do the app graceful timeout and LB drain timeout not conflict?

Summary

bind/unbind is not just a device setting, but connection lifetime control. The actual Graceful Shutdown and Zero Downtime Deployment are established only when the shutdown sequence of L4, Nginx, and App is aligned with Connection Draining as the center.

Next episode preview

In the next part, we deeply analyze the internal operation of Connection Draining from the perspective of TCP state transitions (SYN, FIN, RST, TIME_WAIT, CLOSE_WAIT).

Series navigation

Comments