Clube do livro de programação.
abra um pr para editar essa pagina: github.com/guites/proggers.
Livro: Designing Data-Intensive Applications
Capítulo discutido: Capítulo 2 - Data models and Query Languages
Data da reunião: 15 de abril de 2026
Este foi o material usado para guiar a reunião. O resumo se encontra em https://guites.github.io/proggers/ddia/2/notes.html.
“The limits of my language mean the limits of my world.” Wittgenstein
Choice of data model shapes how you think about the problem.
Most applications are built by layering one data model on top of another, and usually a common question is: how is it represented in terms of the next-lower layer?
Your brain Layer “I need to store the user name, email and age” you are only thinking about the problem now
The data model layer Then you/or a developer translate the data structures into something a computer can organize, like an spreadsheet, JSON files, XML docs or even just PODs
The database layer The database you select to use, gets the data in those formats and figures out how to store it efficiently, or, how to represent that data as bytes in memory, disks or over the internet.
The hardware layer The computer turns all of that into actual electrical signals, ones and zeros on a chip or disk, also is the only layer that thinks about electricity.
[[Game Engine Architecture Vol I..pdf]]
[!cite] About Layers Like all software systems, game engines are built in layers. Normally upper layers depend on lower layers, but not vice versa. When a lower layer depends upon a higher layer, we call this a circular dependency.
Dependency cycles are to be avoided in any software system, because they lead to undesirable coupling between system make the software untestable and inhibit code reuse. This is especially true for a large‑scale system like a game engine.
[!Cite] About data and assets “Game engines deal with a wide range of asset types, from renderable geometry to materials and textures to animation data to audio. These assets are defined in part by the raw data produced by the artists when they use a tool like Maya, Photoshop or SoundForge.
Every game engine requires some kind of database to manage all of the metadata associated with the game’s assets. This database might be implemented using an honest‑to‑goodness relational database such as MySQL or Oracle, or it might be implemented as a collection of text files, managed by a revision control system such as Subversion, Perforce or Git. We’ll call this metadata the resource database in this book.
No matter in what format the resource database is stored and managed, some kind of user interface must be provided to allow users to author and edit the data. At Naughty Dog, we wrote a custom GUI in C# called Builder for this purpose.”
Impedance Mismatch: friction between two ways of describing the same information.
“descoordenação” entre o modelo mental do programa e o modelo de armazenamento do banco
Your program = objects, linked lists, types, inheritence etc
Your database = tables, rows and columns
exemplo:
struct Item {
std::string nome;
double preco;
int quantidade;
};
struct Pedido {
int id;
std::string cliente;
std::vector<Item> itens;
};tabela pedidos:
id | clientetabela itens_pedido:
id | pedido_id | nome | preco | quantidadesalvar um Pedido:
pedidosiditens_pedido com pedido_idcarregar um Pedido:
pedido_idvector<Item>ou seja, constantemente:
Problema:
std::vector<Pedido> pedidos = carregarPedidos();isso por baixo dos panos roda:
SELECT id, cliente FROM pedidos;e digamos que isso retorne ~100 pedidos
entao pra cada pedido, essa query vai rodar 1x:
SELECT nome, preco, quantidade
FROM itens_pedido
WHERE pedido_id = X;ou seja,
1 query inicial + 100 queries = 101 queries = N+1
entao se:
Relational (SQL, since the 70s) won because it hid the messy storage stuff and let people just declare what they wanted.
It assumed data is rows in tables with relationships between them via foreign keys.
Document databases (aka noSQL) (MongoDB, CouchDB, etc.) came back into fashion because:
A person has a name, a job history (list), an education list, contact info. In a document DB you just store the whole résumé as one JSON blob: [[Pasted image 20260415093248.png]] [[Pasted image 20260415093248.png|322]]
{
"user_id": 251,
"first_name": "Bill",
"last_name": "Gates",
"positions": [
{"job_title": "Co-chair", "organization": "Bill & Melinda Gates Foundation"},
{"job_title": "Co-founder", "organization": "Microsoft"}
],
"education": [
{"school_name": "Harvard University", "start": 1973, "end": 1975}
]
}In SQL you’d shred that into a users table, a
positions table, an education table, and join
them back together. The document version has nice locality (one fetch,
all the data). The SQL version is better when many people share the same
data, like if “Microsoft” is its own entity that 10,000 résumés point
to. Update the company name once, everyone sees it. In the document
version you’d have to find every résumé and update it.
-- tables
CREATE TABLE users (
user_id INTEGER PRIMARY KEY,
first_name TEXT,
last_name TEXT
);
CREATE TABLE organizations (
org_id INTEGER PRIMARY KEY,
name TEXT
);
CREATE TABLE schools (
school_id INTEGER PRIMARY KEY,
name TEXT
);
CREATE TABLE positions (
position_id INTEGER PRIMARY KEY,
user_id INTEGER REFERENCES users(user_id),
org_id INTEGER REFERENCES organizations(org_id),
job_title TEXT
);
CREATE TABLE education (
education_id INTEGER PRIMARY KEY,
user_id INTEGER REFERENCES users(user_id),
school_id INTEGER REFERENCES schools(school_id),
start_year INTEGER,
end_year INTEGER
);
-- data
INSERT INTO users VALUES (251, 'Bill', 'Gates');
INSERT INTO organizations VALUES
(1, 'Bill & Melinda Gates Foundation'),
(2, 'Microsoft');
INSERT INTO schools VALUES
(1, 'Harvard University');
INSERT INTO positions VALUES
(1, 251, 1, 'Co-chair'),
(2, 251, 2, 'Co-founder');
INSERT INTO education VALUES
(1, 251, 1, 1973, 1975);One JSON blob became 5 tables and 7 inserts.
To get the résumé back you do a join:
SELECT u.first_name, u.last_name, p.job_title, o.name AS organization
FROM users u
LEFT JOIN positions p ON p.user_id = u.user_id
LEFT JOIN organizations o ON o.org_id = p.org_id
WHERE u.user_id = 251;To determine relationships in SQL:
One-to-Many (1:N): One entity instance relates to many instances of another entity.
one user ─────────> many positions
(Bill) (Co-founder at MS, Co-chair at Gates Foundation)
users positions
┌─────────┬──────┐ ┌─────────┬───────────────┐
│ user_id │ name │ │ user_id │ job_title │
├─────────┼──────┤ ├─────────┼───────────────┤
│ 251 │ Bill │ <────────── │ 251 │ Co-founder │
└─────────┴──────┘ <─────── │ 251 │ Co-chair │
│ 251 │ Trustee │
└─────────┴───────────────┘
Bill has 3 positions. Each position belongs to exactly one Bill. One-to-many.
Many-to-One (N:1): Many instances of one entity relate to one instance of another entity. Same relationship, looked at from the other side. Many rows in B point at one row in A.
many positions ─────────> one organization
(Bill's job, Linus's job) (Microsoft)
positions organizations
┌─────────┬────────┐ ┌────────┬───────────┐
│ user_id │ org_id │ │ org_id │ name │
├─────────┼────────┤ ├────────┼───────────┤
│ 251 │ 1 │ ─────────>│ 1 │ Microsoft │
│ 252 │ 1 │ ─────────>│ │ │
│ 253 │ 1 │ ─────────>│ │ │
└─────────┴────────┘ └────────┴───────────┘
Many positions, one Microsoft. “One-to-many” and “many-to-one” are literally the same rela=-tionship, you just pick which side you’re standing on.
Many-to-Many (N:M): Many instances of one entity relate to many instances of another entity. Both sides can have many of the other. A student takes many courses, a course has many students. Neither side “owns” the other.
students enrollments courses
┌────────────┬──────┐ ┌────────────┬───────────┐ ┌───────────┬─────────┐
│ student_id │ name │ │ student_id │ course_id │ │ course_id │ title │
├────────────┼──────┤ ├────────────┼───────────┤ ├───────────┼─────────┤
│ 1 │ Ada │ ◄──│ 1 │ 101 │───►│ 101 │ Math │
│ 2 │ Bob │ ◄──│ 1 │ 102 │───►│ 102 │ Physics │
└────────────┴──────┘ ◄──│ 2 │ 101 │───►└───────────┴─────────┘
└────────────┴───────────┘
You can’t represent this with one foreign key, you will need a third table in the middle, called a join table.
This is the core trade-off:
TLDR:
[[Pasted image 20260415100707.png|116]]
Tree-shaped means the data branches downward and never loops back to itself.
Resume (Bill)
│
┌─────────┼──────────┐
Positions Education Contact info
│ │
┌───┴───┐ │
Job1 Job2 Harvard
Each thing has one parent. Nothing points sideways. Nothing points back up.
A document database is basically storing the whole tree as one nested blob:
{
"name": "Bill",
"positions": [{...}, {...}],
"education": [{...}]
}If your data is shaped like a tree, this is great. One read gets the whole thing.
But the moment your data stops being a tree, or the moment one node needs to point at another node that lives in a different tree, you’re not in tree territory anymore. You’re in graph territory. That’s when you start needing foreign keys, join tables, or an actual graph database.
Tree: Not a tree (graph):
A A ──> B
/ \ │ │
B C v v
/ \ C <── D
D E │
└──> A <- loops back