TCP Connections Are Expensive: Optimizing Database Performance in a Distributed System at Rise
TCP Connections Are Expensive: Optimizing Database Performance in a Distributed System at Rise
Introduction
In distributed systems, TCP connections are the backbone of communication between services and databases. Every database query over the network initiates a TCP connection, consuming system resources. While lightweight in isolation, at scale, these connections become expensive, especially when handling hundreds of thousands of operations.
A database connection typically operates over TCP/IP, establishing a session between the application and the database server. Each connection consumes memory, CPU, and network resources. Without optimization, excessive connections can overwhelm the database, causing slowdowns or system failures.
This article dives into how we tackled severe performance bottlenecks in a Kubernetes-powered distributed system at Rise, built to manage diverse investment products. The core challenge was optimizing database writes to support accurate and timely stock returns for ~300,000 user plans processed every 15 minutes.
System Overview
Plans Service
The Plans Service manages user investments. Each user can create a plan, representing their allocation of funds into different asset portfolios (stocks, real estate, fixed income, etc.). This service maintains:
- Plans Table: Tracks user plans and their configurations.
- Accounts Table: Holds user account details, including balances.
- Yields Table: Records gains or losses credited to user plans.
Portfolios Service
The Portfolios Service manages the systemâs asset portfolios. Each portfolio reflects a collection of assets whose performance affects user returns. It maintains:
- Portfolios Table: Contains details of each system portfolio.
- Accounts Table: A state replica of the Plans Service account, tracking user units in each portfolio.
Communication Flow
- RabbitMQ: Used to send commands e.g. fund/withdrawal events from the Plans Service to the Portfolios Service - this updates portfolio unit allocations.
- NATS: Used for broadcasting updates among services.
Stage 1: Redesigning the Returns Generation System
The Problem
User returns were inaccurate due to inconsistent state updates between the Portfolios and Plans Services. The root causes were:
- Unsent or Unprocessed RabbitMQ Messages: Events sometimes failed to deliver or were delayed, leading to state mismatches.
- Race Conditions: Simultaneous updates caused data conflicts.
Even if messaging inconsistencies were resolved (there are ways in which we could have ensured fault-tolerant message deliveries), the system still had to calculate each portfolio accountâs earnings and publish the yield for the Plans Service to consume and record for the corresponding plan. This consumption was painfully slowâoften never completing within the 15-minute windowâforcing daily manual reconciliations to ensure users received accurate returns, a tedious and error-prone process.
The Original Workflow
- Plans Service: Send fund/withdrawal events via RabbitMQ to the Portfolios Service.
- Portfolios Service: Updated user state (number of portfolio units).
- Every 15 minutes, it recalculated portfolio index prices, calculated the net gain/loss for each portfolio account and then published the resulting yield over NATS for the Plans service to consume.
- Plans Service: Consume each yield message from NATS and perform the DB writes to record the yield, update account and plan for the corresponding account.
Problem: This design caused two major issues:
- State Inconsistency: Due to unreliable messaging and race conditions.
- Processing Delays: The sequential processing of 300,000 plans was too slow, leading to incomplete yield recordings and manual reconciliations.
The Redesign
To address these issues, I decided to move the entire returns generation process to the Plans Service. This is to solve both the state inconsistency and the speed problem.
- Portfolios Service: Now only calculates the updated portfolio price and sends a RabbitMQ command with the price difference.
- Plans Service: Consumes this command and loads all the stocks plans into a queue for Plan service workers to consume and record returns for the corresponding plan.
Result: This redesign resolved state inconsistencies and improved processing speed. However, while state accuracy was now reliable and processing is now faster, processing still couldnât complete within 15 minutes.
Stage 2: Scaling Infrastructure - Resizing NATS Storage
The Problem
The system began encountering âno space left on deviceâ errors in NATS JetStream due to the high volume of messages (~300k) every 15 minutes. You might wonder âwhy is NATS a problem ?â. This is because after recording the returns for each plan, an update has to be broadcasted to the entire system that the state of the plan has changed (i.e. it has earned some returns). Other parts of the system react to this and perform certain actions. This wasnât a problem before because the system was so slow that the rate of this broadcast is insignificant. With the improved speed of processing, the broadcast rate jumped and this overwhelmed NATS and halted communication between services.
Diagnosis and Fix
- Port Forwarding: Connected a local NATS client to monitor message flow, revealing rapid message buildup.
- PVC Expansion: Edited the Kubernetes Persistent Volume Claim (PVC) to allow expansion and increased storage capacity.
- Helm Redeployment: Redeployed NATS resources using Helm to apply changes.
Result: NATS could handle larger message volumes, but we still have the problem of backpressure as only about 100k records were processed each 15 minutes cycle. This was improved in Stage 3.
Stage 3: Optimizing Core Application Logic
The Problem
Even after redesigning the workflow, processing speed remained a bottleneck due to high database load:
- Each plan update required three database writes (yield record, plan account update, plan update).
- This resulted in ~900,000+ DB connections for ~300k plans.
- High contention due to transactional overhead caused slow processing.
The Optimization
- Batch Processing: Replaced individual plan messages with batches i.e. instead of pushing a message to RabbitMQ for each plan so the workers can pick each one to work on, we now push an offset and a batch size (1,000) to a queue. Each worker takes its offset and processes for the batch of 1,000 and does 3 bulk writes. This means only 300 messages (each with an offset and batch size of 1,000 plans) were queued instead of 300k.
- Bulk Database Writes: Consolidated yield inserts, account updates, and plan updates into three bulk operations per batch.
Result:
- Reduced DB connections from 900,000 to 900.
- Completed all returns generation in under 3 minutes, leaving 12 minutes to spare.
Conclusion
By systematically addressing architectural flaws, scaling infrastructure, and optimizing core logic, Riseâs distributed system now:
- Accurately calculates and credits user stock returns.
- Efficiently processes 300,000 plans within the required time window.
- Maintains system stability even under heavy load.
This journey at Rise demonstrates how understanding the cost of TCP/database connections and thoughtful optimization can significantly improve distributed system performance.
If youâre facing similar challenges or want to discuss system design strategies, feel free to reach out!