Understanding Different Database Types Using a Social Media Example

Let’s explore how different databases work together in a social media platform like LinkedIn, focusing on a dummy profile, Esha Manyam. We’ll explain each database type in detail and how they interact to provide a seamless user experience.

1. Relational Database (RDBMS)

What It Is

A relational database stores data in a structured format using tables, rows, and columns. It enforces relationships between data, making it ideal for transactional operations.

Purpose

Store structured data like user profiles, connections, job postings, and activities.

Use Case: Esha’s Profile Data

When Esha creates her LinkedIn profile, key details like her name, education, and experience are stored in a relational database such as PostgreSQL.

Example Tables

User Table

Jobs Table

Flow:
When Esha updates her profile or applies for a job, the relational database ensures data consistency across her account and related tables.

2. NoSQL Database

What It Is

NoSQL databases handle unstructured or semi-structured data, such as multimedia files or JSON documents. They are designed for high scalability and flexibility.

Purpose

Store user-generated content like articles, multimedia posts, and messages.

Use Case: Esha’s Posts and Multimedia

When Esha shares a photo from her latest conference or uploads a resume, it’s stored in MongoDB, which efficiently handles such unstructured content.

Example Document in MongoDB

jsonCopy code{
  "PostID": "501",
  "UserID": "101",
  "PostType": "image",
  "Content": "Attending Data Conference 2024!",
  "MediaPath": "/uploads/esha_conference.jpg",
  "Timestamp": "2024-11-10"
}

Flow:
Esha’s profile fetches structured data from the RDBMS while her posts (stored in MongoDB) enrich her feed.

3. Column Database

What It Is

Column databases store data in columns rather than rows, making them perfect for analytical workloads and aggregations.

Purpose

Analyze large datasets, such as user engagement metrics, trending content, and job applications.

Use Case: LinkedIn Analytics

To determine how Esha’s posts perform, the platform uses Amazon Redshift to calculate engagement metrics like views, likes, and shares.

Example Data

Engagement Table (Stored in Columns)

Query Example

SELECT PostType, AVG(Likes + Shares) AS AvgEngagement
FROM EngagementAnalytics
GROUP BY PostType;

This helps LinkedIn’s analytics team identify that image posts are the most engaging on average.

Flow:
Esha’s post-engagement data flows from MongoDB into Redshift for analysis, providing insights on which types of content perform best.

4. Graph Database

What It Is

A graph database stores data as nodes (entities) and edges (relationships). It excels at managing and querying complex relationships.

Purpose

Model and analyse user connections, such as friends, colleagues, or mutual connections.

Use Case: Esha’s Network

Esha’s connections, endorsements, and professional groups are stored in Neo4j.

Example Graph Structure

Nodes:
- Esha (User)
- Vamsi (Connection)
- Tech Group (Professional Group)
Edges:
- Esha → Vamsi (Connection)
- Esha → Tech Group (Membership)

Flow:
When LinkedIn suggests “People You May Know,” Neo4j traverses Esha’s network to find second-degree connections like Vamsi’s colleagues.

5. Key-Value Database

What It Is

Key-value databases store data as simple key-value pairs for quick access. They are often used for caching and session management.

Purpose

Provide real-time data retrieval for frequently accessed information like notifications, follower counts, or session data.

Use Case: Esha’s Notifications

When Esha logs in, her unread notifications and profile views are instantly fetched from Redis.

Example Key-Value Pairs

esha_notifications: [notif1, notif2, notif3]  
esha_profile_views: 350

Flow:
Redis ensures Esha’s real-time notifications and session data load instantly, enhancing her user experience.

How These Databases Work Together

Here’s a simplified flow of how LinkedIn, with Esha’s profile, utilizes multiple databases:

User Data (Relational Database):
Esha’s structured data, like her profile and job applications, are stored in PostgreSQL.
User Content (NoSQL Database):
Posts, multimedia uploads, and messages are stored in MongoDB.
Engagement Analytics (Column Database):
Amazon Redshift analyzes engagement metrics to identify trends.
Connections (Graph Database):
Neo4j manages Esha’s professional network and suggests new connections.
Real-Time Data (Key-Value Database):
Redis provides instant access to notifications and session data.

Real-Life Flow Example: Esha Logs In

Login (Redis):
Redis retrieves Esha’s active session and notifications.
Feed Loading (PostgreSQL & MongoDB):
Her feed displays posts from her connections, combining structured and multimedia data.
Friend Suggestions (Neo4j):
LinkedIn suggests new connections based on Esha’s current network.
Post Analytics (Amazon Redshift):
The platform uses analytics to show her how well her latest post is performing.

Why Use Multiple Databases?

Each database type has unique strengths:

Relational databases ensure data integrity and support complex queries.
NoSQL databases handle unstructured data like multimedia.
Column databases provide powerful analytics for user engagement.
Graph databases manage and explore complex relationships.
Key-value stores deliver real-time data for caching and notifications.

By integrating these databases, LinkedIn provides a seamless, fast, and feature-rich user experience for profiles like Esha Manyam’s.

Conclusion

Understanding how different databases complement each other is essential for designing scalable applications. Whether it’s storing structured user data, handling unstructured multimedia, or providing real-time updates, each database plays a vital role.