How to actually choose a database for a new project—a framework that's worked for me
"Just use Postgres" is the default answer online, and honestly it's right about 80% of the time. But there's a decision process worth understanding, especially when your project might fall into that other 20%.
The questions I work through:
Question 1: What's your data shape?
If your data is fundamentally tabular—rows and columns, relationships between entities, structured schemas—relational databases exist for this reason. Postgres, MySQL, SQLite (for embedded), SQL Server if you're in the Microsoft ecosystem.
If your data is document-oriented—nested objects, variable schemas, JSON blobs that change shape—document databases (MongoDB, CouchDB) might fit better. Though I'll note: Postgres's JSONB handles this pretty well now.
If your data is graph-shaped—relationships are as important as the entities themselves, you need to traverse connections—consider graph databases (Neo4j, Amazon Neptune). Social networks, recommendation engines, fraud detection.
If your data is time-series—metrics, events, logs with timestamps as the primary dimension—time-series databases (TimescaleDB, InfluxDB, QuestDB) are optimised for this.
Most applications are relational. Don't get clever unless you have a specific reason.
Question 2: What scale do you actually need?
I've seen teams pick distributed databases for applications that never exceeded a few million rows. The complexity cost was enormous and the scale benefits were never needed.
Honest assessment:
< 10 million rows: Almost any database will be fine. Pick based on ergonomics and team experience.
10-100 million rows: A well-configured single-node relational database handles this comfortably. Index properly, query intelligently.
100 million - 1 billion rows: You'll need to think about partitioning, read replicas, caching. Still manageable with relational databases.
1 billion rows or extreme write throughput: Now you're in territory where distributed databases, sharding, or specialised solutions might be justified.
Most of us will never need the last category. Plan for your actual scale, not imagined future scale.
Question 3: What's your consistency requirement?
ACID transactions matter when data integrity is critical—financial systems, inventory, anything where inconsistency causes real problems. Traditional relational databases are built for this.
If you can tolerate eventual consistency—analytics, caching, content that's not mission-critical—you have more options and potentially better performance/scalability trade-offs.
Question 4: What does your team know?
A team with deep MySQL experience will build a better system with MySQL than with Postgres, even if Postgres is "better" in the abstract. The database your team understands is the database you should use, unless there's a compelling technical reason otherwise.
My actual decision tree:
Is this a general web application with relational data? → Postgres
Is it embedded, mobile, or single-user? → SQLite
Does the team have strong MySQL experience and no Postgres experience? → MySQL
Is the primary access pattern real-time analytics on time-series data? → TimescaleDB or ClickHouse
Is it genuinely graph-shaped data where traversal is the primary operation? → Neo4j
Am I caching or storing session data? → Redis
Everything else → Postgres (seriously)
The "just use Postgres" meme is mostly right:
Postgres handles relational data, JSON documents, full-text search, time-series (with TimescaleDB extension), geospatial (with PostGIS), and key-value patterns. It's rarely the perfect choice, but it's almost never the wrong choice.
When I deviate from Postgres, it's because I have a specific, justifiable reason, not because a different database is "interesting."
What's your decision framework? Has anyone chosen a non-obvious database that turned out to be the right call?
0 Comments