Databases: Beyond SQL
Ever pushed code to production and immediately regretted it? Yeah, me too. The difference between a hobby project and a real application isn't just adding users, it's designing your database like your reputation (and wallet) depends on it. Because they do. This post walks through my mental shift from "just make it work" to "make it work at scale without bankrupting me or getting sued." Spoiler: it starts with asking the right questions before writing a single line of SQL.
Databases, our trusty data store. My first experience with databases came through working on Farmsah (Farm Management Platform). For Farmsah, I needed a place to store farms, crops and other relevant values. I came across SQL vs NoSQL, I chose Postgres, installed it and started with my SQL.
"CREATE DATABASE farmsah;"
A similar approach was taken for tables, and so on.
Looking back, I'm still proud of that approach. It works for personal projects and is an easy way to get familiar with everything, I mean you don't start driving with an 18-wheeler, you start small. Unfortunately, this approach has fast become not feasible. I am not suddenly a 10x developer, but I have grown in my coding and localhost projects do not excite me as much as they used to. This introduces a couple of problems, or I like to call them prayer points. Prayer points because they can really affect my life.
So, now that I want to build projects that have an actual production, I have a couple of options:
- Buy a computer, make it a server and handle everything
- Prayer Point One: we don't have that kind of money here
- Buy a Raspberry Pi, set it up, connect it to the Wi-Fi
- While innovative, I delusionally believe that my SaaS will catch speed and have thousands of users. So the Raspberry Pi won't cut it
- The Cloud: Pay as you use
With my experience with GCP, I will of course favor the cloud—it's convenient and covers Prayer Point One (Finance). But at the same time, the cloud introduces two more Prayer Points:
- Security (The Cloud has a shared responsibility model—basically, we are sharing these services, it's in your best interest to secure your services and not get hacked, because if you do it could lead to me being hacked and vice-versa)
- Scalability (Or as I like to call it, money money money money)
These two Prayer Points are of utmost importance to everyone, including you reading this (assuming you are a dev; if not, welcome, don't be spooked by the nerd talk) because if your production database isn't secure and you have even 1 user and it gets hacked?
YOU ARE SUED. (even worse if that user lives in the EU)
If more users hop on and your database can't handle it? Crash. Now you are without a platform, or worse off, it increases the usage of the database server and Google smiles as you are funding the purchase of the next GPU to train Gemini.
Now, jokes aside, these are real problems I have to face and I plan on facing. The aforementioned have led me to not rush into code, and first step back, look at what I'm planning to build and how. This is where I earn my full-stack developer moniker, actually, I'll go as far as calling myself a backend engineer. So how do I appease both the security and scalability elements?
For the purpose of this illustration, we will use a database for a hypothetical crowdfunding website.
What am I storing? (Entities)
So here the first question I ask myself, or you should ask yourself, is: what am I storing?
Looking at our crowdfunding database we have the following:
- We have the contributors
- The campaigns
- The creators of the campaigns
- Maybe admins?
- Roles of users
This forces you to visualize your entities, give them names. This is step one, as from here you can slowly see that hey, I may need a contributors table, campaigns, and a role field in the contributors, something you may have overlooked.
How do they interact (Entity-Relationships)
Here it's all about asking yourself how each of your defined entities play together, for example:
- A user creates a campaign (One to One)
- An admin approves a campaign
- A user contributes to a campaign, how many can they contribute to? (One to Many?)
- A campaign has users that contribute to it (One to Many)
Once you know how your entities interact, you can actually begin to draw relationships, identify Foreign Keys and the works. Think of it like mapping out who's allowed to sit at which table at a wedding, everyone has a place, and some tables are VIP only.
What can't they do (Access)
What can't certain people do? Here we touch a bit on PP2 (Prayer Point Two), but not as deep yet. So in our case:
- A user that isn't a campaign creator shouldn't be able to delete a campaign (Role Based Access Controls)
- Campaigns that are closed shouldn't be able to receive contributions
This introduces our first layer of defense, technical control (also administrative, depending how you slice your cake. Look up "defense in depth"). We are limiting the actions certain entities can perform on our database; this removes future headaches too.
Thus far we haven't covered our PPs. We start now.
Finance (Scalability)
Now of course our app will need to know how much a campaign has—this is a query, a database sum too. How do we handle that? That sounds simple in theory, but walk with me.
The Query Multiplication Problem
Each time a user logs in we need to show them the total of a campaign. It's not as straightforward as you think once you consider that one user could have 10 campaigns on their home page. 10 Queries? What if we have 10,000 users logging in a period of 20 seconds, 100,000 Queries? This is assuming the totals are there already; let's deal with the calculations.
The Real-Time Calculation Trap
That's not even the end of it. You can say that, oh, let's periodically add up all campaigns, how often do we do this? Every 10 minutes? An hour? What if I contributed, but now I can't see the total go up, did my contributions not make it? The user experience breaks down like a car without oil, it might run for a bit, but eventually, everything seizes up.
The Concurrent Update Nightmare
Okay, so let's add every time a contribution occurs. Well, what if we have 100,000 users contributing at the same time, how many queries is that? What's the instance (computer in the cloud) usage? Is it one spike? What if our database RAM wasn't scaled to handle such situations?
The Real Cost
To fully bring this into scope: a user logs in, sees the first 50 campaigns they can contribute to, that's a query for all 50 and let's multiply that by 100,000 users. See the usage spike? This is why it's a prayer point, because every query is usage, and usage as we said, in the cloud is pay as you use! Let me break down the math: if each database query costs 0.0001 cents (optimistic), and you have 100,000 users each loading 50 campaigns, that's 5 million queries. That's $500 every time your users refresh their feeds. Per day? You're looking at thousands of dollars just in database queries alone.
Now this isn't supposed to be a tutorial on how to deal with this. I'm just sharing my thought process behind designing my databases as of late. I keep a healthy 10,000 users benchmark: can it handle this many users? Usually caching, indexing can handle this, and a lot of smart decisions like not calling the database every time a page is refreshed.
Security
This is a big one, especially with our proposed app.
Data Integrity & Transactions
What if I try to edit the amounts? How do we track transactions, mind you, this is money. Every cent needs a paper trail longer than a CVS receipt.
Sensitive Data Protection
Which data is sensitive? Can I just see Boitumelo's phone number? Should passwords be stored as plain text like "12345687" or properly hashed like "$2b$12$KIXxLpk8Zq3vN.wE5LX7O.fH9wM2pQ1rT4sU6vW7xY8zA9bC0dE1f"? This is also why it's a prayer point, we don't want to end up like the coffee app.
Auditing & Logging
Touching on auditing and logging a bit: every time Boitumelo tries to log in as admin to test his pentesting skills, we should know that, keep a track record of it. When he contributes, we log that. Think of audit logs like security cameras—they don't prevent the crime, but they sure help catch the criminal and prove what happened.
Access Control
And who can access the database? This isn't a house party where everyone gets the Wi-Fi password. Database access should be as exclusive as a VIP lounge, only those who absolutely need it, get it.
This is just the tip of the iceberg. For example, in SA, POPIA Act says I have the right to be forgotten, so let's say we keep a list of campaign contributors, but then I delete my account, what then? I shouldn't see my name there, but you also need to know where the R500 came from. These are all questions that arise.
Obviously, we can't prepare for all this, but we can try. In my next post I'll go over how to deal with these PPs (maybe, if I don't forget).
Between now and again, don't drop production databases (now that's a real prayer point)

Coloured Texts are links by the way
ReplyDelete