Skip to content

Data Structures

Data structure is not a side detail.

It is the way you decide to organize the world inside the program.

And that changes everything.

The same business rule can become:

  • simple, direct code
  • or a Frankenstein full of slow lookups, patches, and duplicated state

If you want to stop solving everything with “whatever works for now,” this is one of the most important pages in the reference library.

The right question is not “which structure is best?”

Section titled “The right question is not “which structure is best?””

The right question is:

which operation dominates this problem?

Because data structures are not chosen by fashion.

You choose them based on the main pressure of the real case:

  • do I need index access?
  • do I need key lookup?
  • do I need uniqueness?
  • do I need arrival order?
  • do I need priority?
  • do I need to represent relationships?

Once you learn this, data stops feeling like shapeless mass.

A data structure is an organized way of storing values so certain operations become easier.

Every structure has strengths and weaknesses.

In other words:

a structure that is great for one operation
may be poor for another

Classic example:

  • array is great for index access
  • map is great for key lookup
  • queue is great for arrival order
  • heap is great for priority

There is no magic structure.

There is coherent selection.

This distinction matters a lot.

Answers:

how is the data organized?

Answers:

how do I transform this data to reach the result?

They work together.

Example:

  • structure: Map
  • algorithm: search, update, frequency counting

Without a good structure, the algorithm struggles.

Without a good algorithm, structure alone does not save you.

Before choosing a structure, answer:

  1. Which operation happens most often?
  2. Does order matter?
  3. Is access by index, key, priority, or relationship?
  4. Will I insert and remove a lot?
  5. Do I need uniqueness?
  6. Will I scan everything or fetch one specific item?

That already solves most decisions.

Main situationStructure that usually fits
Order and indexArray / list
Local insert/remove with pointersLinked list
Undo, history, parsingStack
Arrival orderQueue
Work from both endsDeque
Lookup by keyMap / dictionary / hash map
Guarantee uniquenessSet
HierarchyTree
Frequent min/max accessHeap / priority queue
Relationships and pathsGraph

Do not memorize this as a ritual.

Use it as a starting map.

You do not need to memorize a hundred tables right now.

But you do need to feel the difference between:

  • direct access
  • item-by-item search
  • insertion at the front
  • insertion in the middle
  • removing from the front

Simplified table:

StructureIndex accessKey/value lookupInsert at endInsert at frontMain note
Array / dynamic listgoodusually lineargood amortizedpoorgreat for sequence
Linked listpoorlineargood with the right pointergoodpoor for index access
Stacktop onlynot the pointpush/pop are goodn/aLIFO
Queuefront and backnot the pointenqueue/dequeue are goodn/aFIFO
Map / hash mapn/ausually very goodvery goodn/akey -> value
Setn/avery good for existencevery goodn/auniqueness
Heapnot for arbitrary indexingtop is very goodinsertion is goodn/apriority
Graphdepends on representationdependsdependsdependsmodels relationships

Now let us get into the real value: structure by structure.

An array is a collection of elements stored in sequence.

At a low level, the strong idea is:

elements live in contiguous memory positions
Index: 0 1 2 3
+----+----+----+----+
Value: | 10 | 20 | 30 | 40 |
+----+----+----+----+

If the element has fixed size, accessing index i is very natural.

  • order matters
  • you iterate a lot
  • index access makes sense
  • append at the end solves most of the job
  • you insert in the middle all the time
  • you keep removing from the front
  • you constantly search by ID inside the list

In C, array is a core structure.

#include <stdio.h>
int main(void) {
int numbers[4] = {10, 20, 30, 40};
printf("%d\n", numbers[0]); // 10
printf("%d\n", numbers[2]); // 30
for (int i = 0; i < 4; i++) {
printf("%d ", numbers[i]);
}
return 0;
}
base -> +----+----+----+----+
| 10 | 20 | 30 | 40 |
+----+----+----+----+
^ ^ ^ ^
+0 +4 +8 +12 (example with 4-byte int)

std::vector is the most common dynamic list in everyday C++.

#include <iostream>
#include <vector>
int main() {
std::vector<int> numbers = {10, 20, 30};
numbers.push_back(40);
for (int value : numbers) {
std::cout << value << ' ';
}
}
const numbers = [10, 20, 30];
numbers.push(40);
console.log(numbers[1]); // 20
console.log(numbers); // [10, 20, 30, 40]
numbers = [10, 20, 30]
numbers.append(40)
print(numbers[1]) # 20
print(numbers) # [10, 20, 30, 40]

If you need to insert at index 1:

Before:
[10, 20, 30, 40]
Insert 99 at 1
After:
[10, 99, 20, 30, 40]

Following elements must shift.

Diagram:

Index: 0 1 2 3
Before: 10 20 30 40
Index: 0 1 2 3 4
After: 10 99 20 30 40

That is why arrays are not universal kings.

A linked list solves a different kind of pressure.

Instead of storing everything contiguously, it chains nodes.

Each node usually stores:

  • value
  • pointer to the next node
+------+-------+ +------+-------+ +------+------+
| 10 | next -|--->| 20 | next -|--->| 30 | null |
+------+-------+ +------+-------+ +------+------+
  • inserting or removing a known node can be cheap
  • useful for pointer-based structures
  • index access is poor
  • extra pointer memory exists
  • cache locality is often worse than arrays
#include <stdio.h>
#include <stdlib.h>
typedef struct Node {
int value;
struct Node* next;
} Node;
Node* create_node(int value) {
Node* node = (Node*)malloc(sizeof(Node));
node->value = value;
node->next = NULL;
return node;
}
void append(Node** head, int value) {
Node* new_node = create_node(value);
if (*head == NULL) {
*head = new_node;
return;
}
Node* current = *head;
while (current->next != NULL) {
current = current->next;
}
current->next = new_node;
}
void print_list(Node* head) {
Node* current = head;
while (current != NULL) {
printf("%d -> ", current->value);
current = current->next;
}
printf("NULL\n");
}
int main(void) {
Node* head = NULL;
append(&head, 10);
append(&head, 20);
append(&head, 30);
print_list(head);
return 0;
}

A stack follows LIFO:

Last In, First Out

That means:

  • the last item in is the first item out
top
|
v
+----+
| 30 |
+----+
| 20 |
+----+
| 10 |
+----+
  • undo actions
  • history
  • expression parsing
  • parentheses validation
  • function calls
  • iterative DFS
#include <stdbool.h>
#include <stdio.h>
#define MAX 5
typedef struct {
int items[MAX];
int top;
} Stack;
void init_stack(Stack* s) {
s->top = -1;
}
bool is_empty(Stack* s) {
return s->top == -1;
}
bool is_full(Stack* s) {
return s->top == MAX - 1;
}
bool push(Stack* s, int value) {
if (is_full(s)) {
return false;
}
s->items[++s->top] = value;
return true;
}
bool pop(Stack* s, int* removed) {
if (is_empty(s)) {
return false;
}
*removed = s->items[s->top--];
return true;
}
int main(void) {
Stack s;
int removed;
init_stack(&s);
push(&s, 10);
push(&s, 20);
push(&s, 30);
pop(&s, &removed);
printf("%d\n", removed); // 30
return 0;
}

In JavaScript, array already works well as a stack:

const stack = [];
stack.push(10);
stack.push(20);
stack.push(30);
console.log(stack.pop()); // 30
console.log(stack.pop()); // 20

A queue follows FIFO:

First In, First Out

That means:

  • the first item in is the first item out
input -> [10] [20] [30] -> output
  • service queues
  • async processing
  • pending jobs
  • events
  • BFS

This matters because removing from the front of a plain array can be expensive.

#include <stdbool.h>
#include <stdio.h>
#define MAX 5
typedef struct {
int items[MAX];
int front;
int rear;
int size;
} Queue;
void init_queue(Queue* q) {
q->front = 0;
q->rear = 0;
q->size = 0;
}
bool enqueue(Queue* q, int value) {
if (q->size == MAX) {
return false;
}
q->items[q->rear] = value;
q->rear = (q->rear + 1) % MAX;
q->size++;
return true;
}
bool dequeue(Queue* q, int* removed) {
if (q->size == 0) {
return false;
}
*removed = q->items[q->front];
q->front = (q->front + 1) % MAX;
q->size--;
return true;
}
int main(void) {
Queue q;
int removed;
init_queue(&q);
enqueue(&q, 10);
enqueue(&q, 20);
enqueue(&q, 30);
dequeue(&q, &removed);
printf("%d\n", removed); // 10
return 0;
}

For real queue behavior in Python, collections.deque is usually better than a regular list.

from collections import deque
queue = deque()
queue.append(10)
queue.append(20)
queue.append(30)
print(queue.popleft()) # 10

Deque means double-ended queue.

That means working efficiently from both ends.

from collections import deque
dq = deque([10, 20, 30])
dq.appendleft(5)
dq.append(40)
print(dq) # deque([5, 10, 20, 30, 40])

This is one of the highest-value structures in real software.

Map solves:

key -> value

Examples:

  • userId -> user
  • email -> account
  • word -> frequency
  • sku -> product

Because a lot of day-to-day software is really about key lookup.

If you use a list for that, the pattern becomes:

scan item by item
again
and again
and again

With a map, the intent becomes much clearer.

Under the hood, many maps use a hash table.

Mentally:

  1. take the key
  2. compute a hash
  3. use that to decide where to store the value

Simplified diagram:

"ana" -> hash -> bucket 2
"bia" -> hash -> bucket 5
"leo" -> hash -> bucket 2

When two keys land in the same bucket, you have a collision.

Then the implementation resolves it with a strategy.

#include <iostream>
#include <string>
#include <unordered_map>
int main() {
std::unordered_map<std::string, int> scores;
scores["ana"] = 95;
scores["bia"] = 88;
std::cout << scores["ana"] << '\n'; // 95
}
const users = new Map();
users.set("u1", { name: "Ana" });
users.set("u2", { name: "Leo" });
console.log(users.get("u1")); // { name: "Ana" }
console.log(users.has("u3")); // false
scores = {
"ana": 95,
"bia": 88,
}
print(scores["ana"]) # 95
print("leo" in scores) # False

Frequency counting with map: a very important pattern

Section titled “Frequency counting with map: a very important pattern”
text = "banana"
freq = {}
for letter in text:
freq[letter] = freq.get(letter, 0) + 1
print(freq)

Output:

{'b': 1, 'a': 3, 'n': 2}

This pattern appears in:

  • counting
  • deduplication
  • grouping
  • log analysis

Set is for when the main question is:

has this already appeared?

Or:

do I need to prevent repetition?
Input: [10, 20, 20, 30, 10]
Set: {10, 20, 30}
  • removing duplicates
  • preventing reprocessing
  • tracking seen items
  • fast membership test
values = [10, 20, 20, 30, 10]
unique = set(values)
print(unique) # {10, 20, 30}
print(20 in unique) # True
const values = [10, 20, 20, 30, 10];
const unique = new Set(values);
console.log(unique); // Set(3) { 10, 20, 30 }
console.log(unique.has(20)); // true

A tree appears when hierarchy exists.

Examples:

  • DOM
  • file system
  • categories
  • compiler AST
  • nested menus
8
/ \
3 10
/ \ \
1 6 14
  • root
  • node
  • child
  • parent
  • leaf
  • height

In a BST:

  • left < node
  • right > node

That helps ordered searching, but performance depends on balance.

#include <iostream>
struct Node {
int value;
Node* left;
Node* right;
Node(int v) : value(v), left(nullptr), right(nullptr) {}
};
Node* insert(Node* root, int value) {
if (root == nullptr) {
return new Node(value);
}
if (value < root->value) {
root->left = insert(root->left, value);
} else {
root->right = insert(root->right, value);
}
return root;
}
void inorder(Node* root) {
if (root == nullptr) {
return;
}
inorder(root->left);
std::cout << root->value << ' ';
inorder(root->right);
}
int main() {
Node* root = nullptr;
root = insert(root, 8);
root = insert(root, 3);
root = insert(root, 10);
root = insert(root, 1);
root = insert(root, 6);
inorder(root); // 1 3 6 8 10
}

A heap is a priority structure.

It is not just “some tree.”

It is a tree with a heap rule.

Each parent is smaller than or equal to its children.

2
/ \
5 8
/ \
9 12

The strong point is:

get the smallest
or the largest, depending on the variant
  • scheduling
  • top K
  • priority queues
  • stream merging
  • repeated minimum-cost extraction
import heapq
heap = []
heapq.heappush(heap, 8)
heapq.heappush(heap, 3)
heapq.heappush(heap, 5)
print(heapq.heappop(heap)) # 3
print(heapq.heappop(heap)) # 5

By default, priority_queue gives you the largest item first.

#include <iostream>
#include <queue>
int main() {
std::priority_queue<int> pq;
pq.push(8);
pq.push(3);
pq.push(10);
std::cout << pq.top() << '\n'; // 10
}

Graph enters when the problem is about relationships.

Examples:

  • social networks
  • routes
  • dependencies
  • recommendations
  • service mesh
A ---- B
| |
| |
C ---- D
A: [B, C]
B: [A, D]
C: [A, D]
D: [B, C]

This representation is often useful because it is clear and space-efficient in many scenarios.

graph = {
"A": ["B", "C"],
"B": ["A", "D"],
"C": ["A", "D"],
"D": ["B", "C"],
}
for neighbor in graph["A"]:
print(neighbor)
const graph = {
A: ["B", "C"],
B: ["D"],
C: ["D"],
D: [],
};
function bfs(start) {
const queue = [start];
const visited = new Set([start]);
while (queue.length > 0) {
const node = queue.shift();
console.log(node);
for (const neighbor of graph[node]) {
if (!visited.has(neighbor)) {
visited.add(neighbor);
queue.push(neighbor);
}
}
}
}
bfs("A");

How to choose the right structure in the real world

Section titled “How to choose the right structure in the real world”

If you have this:

list of products
and you keep searching by id

You probably want a map.

If the main job is sequential traversal and rendering:

You probably want an array/list.

You want to go back to the previous item.

You probably want a stack.

You want arrival-order processing.

You probably want a queue.

You want to know whether an item was already seen.

You probably want a set.

You always want the most urgent item next.

You probably want a heap / priority queue.

You need to model connections.

You probably want a graph.

  • you keep doing linear search on the same collection
  • every file recreates the same index manually
  • your code uses lists for everything
  • you duplicated the same data in multiple collections without a strategy
  • access rules are hidden behind find, filter, some all the time
  • choosing a structure because the name sounds advanced
  • using a map when a list would be clearer
  • using a list for key lookup at high volume
  • forgetting the cost of removing from the front
  • confusing “it works” with “it is well modeled”
  • spreading raw structure manipulation across the system

Even with a good structure, do not spread raw manipulation across the project.

Better:

  • addOrder(order)
  • findOrderById(id)
  • markOrderProcessed(id)

Worse:

  • every file touching the same array, map, and set directly

Good structure + clear API = much more sustainable software.

  1. implement a stack with array
  2. implement a queue with circular buffer
  3. count word frequency with map
  4. remove duplicates with set
  5. build a simple linked list
  6. implement a basic BST
  7. use a heap for the top 3 smallest values
  8. model a graph with adjacency list

For each exercise, answer:

  1. which operation does this structure favor?
  2. what does it do badly?
  3. why did I choose it instead of a plain list?
  4. how would I explain this choice in an interview or review?
  • Do you know when array beats linked list?
  • Do you understand why stack and queue exist even though arrays exist?
  • Do you know when map is more appropriate than list?
  • Do you know when set prevents bugs and simplifies logic?
  • Do you understand the basic idea of tree, heap, and graph?
  • Can you justify your choice based on the dominant operation?

If yes, your modeling foundation is already leaving tutorial level.

  • If the reasoning base still feels shaky, review Programming Logic
  • If you want to think more about solution quality and cost, continue to Algorithms
  • If you want to strengthen low-level representation, revisit Data Types