Let’s explore how different databases work together in a social media platform like LinkedIn, focusing on a dummy profile, Esha Manyam. We’ll explain each database type in detail and how they interact to provide a seamless user experience.
1. Relational Database (RDBMS)
What It Is
A relational database stores data in a structured format using tables, rows, and columns. It enforces relationships between data, making it ideal for transactional operations.
Purpose
- Store structured data like user profiles, connections, job postings, and activities.
Use Case: Esha’s Profile Data
When Esha creates her LinkedIn profile, key details like her name, education, and experience are stored in a relational database such as PostgreSQL.
Example Tables
User Table
Jobs Table
Flow:
When Esha updates her profile or applies for a job, the relational database ensures data consistency across her account and related tables.
2. NoSQL Database
What It Is
NoSQL databases handle unstructured or semi-structured data, such as multimedia files or JSON documents. They are designed for high scalability and flexibility.
Purpose
- Store user-generated content like articles, multimedia posts, and messages.
Use Case: Esha’s Posts and Multimedia
When Esha shares a photo from her latest conference or uploads a resume, it’s stored in MongoDB, which efficiently handles such unstructured content.
Example Document in MongoDB
jsonCopy code{
"PostID": "501",
"UserID": "101",
"PostType": "image",
"Content": "Attending Data Conference 2024!",
"MediaPath": "/uploads/esha_conference.jpg",
"Timestamp": "2024-11-10"
}
Flow:
Esha’s profile fetches structured data from the RDBMS while her posts (stored in MongoDB) enrich her feed.
3. Column Database
What It Is
Column databases store data in columns rather than rows, making them perfect for analytical workloads and aggregations.
Purpose
- Analyze large datasets, such as user engagement metrics, trending content, and job applications.
Use Case: LinkedIn Analytics
To determine how Esha’s posts perform, the platform uses Amazon Redshift to calculate engagement metrics like views, likes, and shares.
Example Data
Engagement Table (Stored in Columns)
Query Example
SELECT PostType, AVG(Likes + Shares) AS AvgEngagement
FROM EngagementAnalytics
GROUP BY PostType;
This helps LinkedIn’s analytics team identify that image posts are the most engaging on average.
Flow:
Esha’s post-engagement data flows from MongoDB into Redshift for analysis, providing insights on which types of content perform best.
4. Graph Database
What It Is
A graph database stores data as nodes (entities) and edges (relationships). It excels at managing and querying complex relationships.
Purpose
- Model and analyse user connections, such as friends, colleagues, or mutual connections.
Use Case: Esha’s Network
Esha’s connections, endorsements, and professional groups are stored in Neo4j.
Example Graph Structure
- Nodes:
- Esha (User)
- Vamsi (Connection)
- Tech Group (Professional Group)
- Edges:
- Esha → Vamsi (Connection)
- Esha → Tech Group (Membership)
Flow:
When LinkedIn suggests “People You May Know,” Neo4j traverses Esha’s network to find second-degree connections like Vamsi’s colleagues.
5. Key-Value Database
What It Is
Key-value databases store data as simple key-value pairs for quick access. They are often used for caching and session management.
Purpose
- Provide real-time data retrieval for frequently accessed information like notifications, follower counts, or session data.
Use Case: Esha’s Notifications
When Esha logs in, her unread notifications and profile views are instantly fetched from Redis.
Example Key-Value Pairs
esha_notifications: [notif1, notif2, notif3]
esha_profile_views: 350
Flow:
Redis ensures Esha’s real-time notifications and session data load instantly, enhancing her user experience.
How These Databases Work Together
Here’s a simplified flow of how LinkedIn, with Esha’s profile, utilizes multiple databases:
- User Data (Relational Database):
Esha’s structured data, like her profile and job applications, are stored in PostgreSQL. - User Content (NoSQL Database):
Posts, multimedia uploads, and messages are stored in MongoDB. - Engagement Analytics (Column Database):
Amazon Redshift analyzes engagement metrics to identify trends. - Connections (Graph Database):
Neo4j manages Esha’s professional network and suggests new connections. - Real-Time Data (Key-Value Database):
Redis provides instant access to notifications and session data.
Real-Life Flow Example: Esha Logs In
- Login (Redis):
Redis retrieves Esha’s active session and notifications. - Feed Loading (PostgreSQL & MongoDB):
Her feed displays posts from her connections, combining structured and multimedia data. - Friend Suggestions (Neo4j):
LinkedIn suggests new connections based on Esha’s current network. - Post Analytics (Amazon Redshift):
The platform uses analytics to show her how well her latest post is performing.
Why Use Multiple Databases?
Each database type has unique strengths:
- Relational databases ensure data integrity and support complex queries.
- NoSQL databases handle unstructured data like multimedia.
- Column databases provide powerful analytics for user engagement.
- Graph databases manage and explore complex relationships.
- Key-value stores deliver real-time data for caching and notifications.
By integrating these databases, LinkedIn provides a seamless, fast, and feature-rich user experience for profiles like Esha Manyam’s.
Conclusion
Understanding how different databases complement each other is essential for designing scalable applications. Whether it’s storing structured user data, handling unstructured multimedia, or providing real-time updates, each database plays a vital role.