Data Structures and Applications

Comprehensive Theory Notes for CBDT Assistant Director (Systems) Exam

1. Introduction to Data Structures

A data structure is a systematic way of organizing, storing, and managing data so that it can be accessed and modified efficiently. Data structures are the backbone of computer science and software engineering.

Classification of Data Structures

2. Arrays

An array is a collection of elements of the same data type stored in contiguous memory locations.

Key Properties

Fixed size (static allocation)
Random access via index: O(1)
Homogeneous elements
Memory address of element: Address = Base_Address + (index × size_of_element)

Operations and Complexities

Operation	Best	Average	Worst
Access	O(1)	O(1)	O(1)
Search	O(1)	O(n)	O(n)
Insertion	O(1)	O(n)	O(n)
Deletion	O(1)	O(n)	O(n)

Types of Arrays

One-dimensional: Simple linear array
Two-dimensional: Matrix representation (rows × columns)
Multi-dimensional: Arrays of arrays

Applications

Implementing other data structures (stacks, queues, heaps)
Lookup tables and matrices in scientific computing
String storage (character arrays)

3. Linked Lists

A linked list is a linear collection of data elements (nodes) where each node points to the next.

Types of Linked Lists

Type	Description
Singly Linked List	Each node points to next node only
Doubly Linked List	Each node has pointers to both next and previous
Circular Linked List	Last node points back to first node
Doubly Circular	Combination of doubly and circular

Operations and Complexities

Operation	Singly	Doubly
Insert at head	O(1)	O(1)
Insert at tail	O(n) / O(1)*	O(1)
Delete head	O(1)	O(1)
Search	O(n)	O(n)
Random Access	O(n)	O(n)

*O(1) if tail pointer is maintained

Advantages over Arrays

Dynamic size (no need to pre-allocate)
Efficient insertion/deletion at any position
No memory wastage

Disadvantages

No random access
Extra memory for pointers
Poor cache locality

Applications

Dynamic memory allocation
Implementing stacks, queues, and hash tables
Polynomial arithmetic
Undo functionality in applications

4. Stacks

A Stack is a linear data structure following LIFO (Last In, First Out) principle.

Operations

push(x): Insert element x at top — O(1)
pop(): Remove and return top element — O(1)
peek()/top(): Return top element without removing — O(1)
isEmpty(): Check if stack is empty — O(1)

Implementations

Array-based: Fixed or dynamic size; may overflow
Linked list-based: No size limit; uses extra pointer memory

Key Concepts

Overflow: Push on a full stack
Underflow: Pop from an empty stack
Top pointer: Always points to the most recently added element

Applications

Expression evaluation and conversion (infix ↔ postfix ↔ prefix)
Parenthesis matching and syntax parsing
Function call management (call stack / activation records)
Undo/Redo operations in editors
Backtracking algorithms (maze solving, N-Queens)
Tower of Hanoi problem
DFS (Depth-First Search) implementation

Infix to Postfix Conversion (Algorithm)

Scan the infix expression left to right
If operand → add to output
If '(' → push to stack
If ')' → pop and output until '(' is found
If operator → pop operators with higher or equal precedence, then push current operator
After scanning, pop all remaining operators

5. Queues

A Queue is a linear data structure following FIFO (First In, First Out) principle.

Operations

enqueue(x): Insert element at rear — O(1)
dequeue(): Remove element from front — O(1)
front(): Return front element — O(1)
isEmpty(): Check if queue is empty — O(1)

Types of Queues

Type	Description
Simple Queue	Standard FIFO
Circular Queue	Last position connects to first; solves memory wastage
Priority Queue	Elements served by priority, not arrival order
Deque (Double-ended)	Insert/delete from both ends
Input-restricted Deque	Insert at one end, delete from both
Output-restricted Deque	Insert at both ends, delete from one

Circular Queue

Solves the problem of unused space in array-based queues
Full condition: (REAR + 1) % n == FRONT
Empty condition: REAR == FRONT
Next position: (REAR + 1) % n

Applications

CPU scheduling (Round Robin, FCFS)
Disk scheduling algorithms
BFS (Breadth-First Search) traversal
Spooling in print management
Message queues in distributed systems
Handling requests in web servers

6. Trees

A Tree is a hierarchical, non-linear data structure consisting of nodes with a root and subtrees.

Terminology

Root: Topmost node (no parent)
Parent/Child: Direct ancestor/descendant relationship
Leaf: Node with no children
Degree: Number of children of a node
Height: Longest path from node to leaf
Level: Root is at level 0; children at level 1, etc.
Forest: Collection of disjoint trees

Binary Tree

Each node has at most 2 children (left and right).

Maximum nodes at level i: 2^i
Maximum nodes in tree of height h: 2^(h+1) - 1
Minimum nodes in tree of height h: h + 1

Types of Binary Trees

Type	Description
Full/Strict	Every node has 0 or 2 children
Complete	All levels filled except possibly last; filled left to right
Perfect	All internal nodes have 2 children; all leaves at same level
Degenerate/Pathological	Each parent has exactly one child (like linked list)
Balanced	Height difference of left and right subtrees ≤ 1

Tree Traversals

Traversal	Order	Use Case
Inorder	Left → Root → Right	BST gives sorted order
Preorder	Root → Left → Right	Create copy, prefix expression
Postorder	Left → Right → Root	Delete tree, postfix expression
Level-order	Level by level (BFS)	Level-wise processing

Binary Search Tree (BST)

Left subtree contains only nodes with values less than root
Right subtree contains only values greater than root
Both subtrees are also BSTs

Operation	Average	Worst (skewed)
Search	O(log n)	O(n)
Insert	O(log n)	O(n)
Delete	O(log n)	O(n)

BST Deletion Cases:
1. Leaf node: Simply remove
2. One child: Replace with child
3. Two children: Replace with inorder successor (smallest in right subtree) or inorder predecessor (largest in left subtree)

AVL Trees (Adelson-Vland and Landis)

Self-balancing BST where the balance factor (height of left subtree - height of right subtree) is in {-1, 0, 1}.

Rotations for Rebalancing:
| Case | Condition | Rotation |
|------|-----------|----------|
| LL (Left-Left) | Left-heavy, left child left-heavy | Right Rotation |
| RR (Right-Right) | Right-heavy, right child right-heavy | Left Rotation |
| LR (Left-Right) | Left-heavy, left child right-heavy | Left-Right Rotation |
| RL (Right-Left) | Right-heavy, right child left-heavy | Right-Left Rotation |

Guaranteed height: h ≤ 1.44 log₂(n) → all operations O(log n)

B-Trees

Self-balancing search tree designed for disk-based storage (databases, file systems).

Properties of B-Tree of order m:
- Each node has at most m children
- Each non-root node has at least ⌈m/2⌉ children
- Root has at least 2 children (if not leaf)
- All leaves at same level
- A non-leaf node with k children contains k-1 keys

B+ Trees

Variant of B-tree where all data pointers are at leaf level, and leaves are linked.

Advantages over B-Tree:
- More keys per node (smaller pointers at internal nodes)
- Sequential access via linked leaves
- Better for range queries

Used in: Database indexing (MySQL, PostgreSQL), file systems (NTFS, ext4)

Heaps

A heap is a complete binary tree satisfying the heap property.

Type	Property
Max-Heap	Parent ≥ Children (root is maximum)
Min-Heap	Parent ≤ Children (root is minimum)

Array representation:
- Parent of i: (i-1)/2
- Left child of i: 2i + 1
- Right child of i: 2i + 2

Operation	Time Complexity
Insert	O(log n)
Delete root	O(log n)
Find min/max	O(1)
Build heap	O(n)
Heapify	O(log n)

Applications: Priority queues, Heap sort, Dijkstra's algorithm, Prim's algorithm

Tries (Prefix Trees)

A trie is a tree-like data structure for storing strings where each node represents a character.

Properties:
- Root is empty
- Each path from root to a marked node represents a string
- Common prefixes share nodes

Operation	Time Complexity
Insert	O(L) where L = length of string
Search	O(L)
Delete	O(L)

Space: O(N × L × A) where N = number of strings, L = average length, A = alphabet size

Applications: Autocomplete, spell checkers, IP routing (longest prefix matching), word games

7. Graphs

A Graph G = (V, E) consists of a set of vertices (V) and edges (E).

Graph Terminology

Directed Graph (Digraph): Edges have direction
Undirected Graph: Edges are bidirectional
Weighted Graph: Edges have weights/costs
Simple Graph: No self-loops or multiple edges
Complete Graph: Every pair of vertices connected; n(n-1)/2 edges
Connected Graph: Path exists between every pair of vertices
Strongly Connected: Path exists in both directions (directed)
Bipartite Graph: Vertices divided into two sets; edges only between sets
Planar Graph: Can be drawn without edge crossings
Tree: Connected, acyclic graph with n-1 edges
DAG: Directed Acyclic Graph

Graph Representations

Representation	Space	Edge Check	Add Edge	Add Vertex
Adjacency Matrix	O(V²)	O(1)	O(1)	O(V²)
Adjacency List	O(V + E)	O(V)	O(1)	O(1)
Incidence Matrix	O(V × E)	O(E)	O(E)	O(V × E)

Graph Traversals

BFS (Breadth-First Search)

Uses Queue
Explores level by level
Time: O(V + E) with adjacency list, O(V²) with matrix
Space: O(V)
Applications: Shortest path (unweighted), level-order traversal, finding connected components

DFS (Depth-First Search)

Uses Stack (or recursion)
Explores as deep as possible before backtracking
Time: O(V + E) with adjacency list, O(V²) with matrix
Space: O(V)
Applications: Topological sort, cycle detection, strongly connected components, path finding

Key Graph Algorithms

Algorithm	Purpose	Time Complexity
Dijkstra	Single-source shortest path (non-negative weights)	O((V+E) log V) with min-heap
Bellman-Ford	Single-source shortest path (handles negative weights)	O(V × E)
Floyd-Warshall	All-pairs shortest path	O(V³)
Kruskal	Minimum Spanning Tree	O(E log E)
Prim	Minimum Spanning Tree	O((V+E) log V)
Topological Sort	Linear ordering of DAG	O(V + E)
Kosaraju/Tarjan	Strongly Connected Components	O(V + E)

8. Hashing

Hashing is a technique to map data to a fixed-size table using a hash function.

Hash Function

h(key) = key % table_size (division method)

Collision Resolution Techniques

Technique	Description	Avg Search	Worst Search
Chaining	Each bucket holds a linked list	O(1 + α)	O(n)
Linear Probing	h(k, i) = (h'(k) + i) % m	O(1/(1-α))	O(n)
Quadratic Probing	h(k, i) = (h'(k) + c₁i + c₂i²)	Better than linear	O(n)
Double Hashing	h(k, i) = (h₁(k) + i×h₂(k)) % m	Best distribution	O(n)

Where α (load factor) = n/m (n = elements, m = table size)

Good Hash Function Properties

Minimizes collisions
Uniform distribution
Easy to compute
Uses all input data

Applications

Database indexing
Symbol tables in compilers
Password storage (with cryptographic hash)
Caching (hash maps)
Deduplication

9. Sorting Algorithms

Comparison-Based Sorting

Algorithm	Best	Average	Worst	Space	Stable?
Bubble Sort	O(n)	O(n²)	O(n²)	O(1)	Yes
Selection Sort	O(n²)	O(n²)	O(n²)	O(1)	No
Insertion Sort	O(n)	O(n²)	O(n²)	O(1)	Yes
Merge Sort	O(n log n)	O(n log n)	O(n log n)	O(n)	Yes
Quick Sort	O(n log n)	O(n log n)	O(n²)	O(log n)	No
Heap Sort	O(n log n)	O(n log n)	O(n log n)	O(1)	No
Tim Sort	O(n)	O(n log n)	O(n log n)	O(n)	Yes

Non-Comparison Based Sorting

Algorithm	Best	Average	Worst	Space	Notes
Counting Sort	O(n+k)	O(n+k)	O(n+k)	O(k)	k = range of input
Radix Sort	O(d×(n+k))	O(d×(n+k))	O(d×(n+k))	O(n+k)	d = digits, k = base
Bucket Sort	O(n+k)	O(n+k)	O(n²)	O(n)	k = number of buckets

Key Sorting Concepts

Stable sort: Maintains relative order of equal elements
In-place sort: Uses O(1) extra space
Lower bound for comparison sort: Ω(n log n)
Quick Sort is generally fastest in practice (good cache performance)
Merge Sort is preferred for linked lists and external sorting
Heap Sort guarantees O(n log n) with O(1) space

10. Searching Algorithms

Linear Search

Sequentially check each element
Time: O(n)
Works on unsorted data

Binary Search

Requires sorted array
Repeatedly divide search space in half
Time: O(log n)
Space: O(1) iterative, O(log n) recursive

Interpolation Search

Improved binary search for uniformly distributed data
Time: O(log log n) average, O(n) worst
Position formula: low + ((key - arr[low]) × (high - low)) / (arr[high] - arr[low])

Comparison Table

Algorithm	Best	Average	Worst	Requirement
Linear Search	O(1)	O(n)	O(n)	None
Binary Search	O(1)	O(log n)	O(log n)	Sorted array
Interpolation	O(1)	O(log log n)	O(n)	Sorted + uniform

11. Complexity Summary Table

Data Structure	Access	Search	Insert	Delete	Space
Array	O(1)	O(n)	O(n)	O(n)	O(n)
Sorted Array	O(1)	O(log n)	O(n)	O(n)	O(n)
Stack	O(n)	O(n)	O(1)	O(1)	O(n)
Queue	O(n)	O(n)	O(1)	O(1)	O(n)
Singly Linked List	O(n)	O(n)	O(1)*	O(1)*	O(n)
Doubly Linked List	O(n)	O(n)	O(1)	O(1)	O(n)
BST (balanced)	O(log n)	O(log n)	O(log n)	O(log n)	O(n)
BST (worst)	O(n)	O(n)	O(n)	O(n)	O(n)
AVL Tree	O(log n)	O(log n)	O(log n)	O(log n)	O(n)
Hash Table	N/A	O(1)	O(1)	O(1)	O(n)
Heap	O(n)	O(n)	O(log n)	O(log n)	O(n)
Trie	O(L)	O(L)	O(L)	O(L)	O(N×L)

*At known position

12. Key Formulas and Theorems

Maximum nodes in binary tree of height h: 2^(h+1) - 1
Minimum nodes in binary tree of height h: h + 1
Height of complete binary tree with n nodes: ⌊log₂(n)⌋
Number of leaf nodes in full binary tree: (n + 1) / 2
Handshaking Lemma: Sum of all degrees = 2 × |E|
Euler's formula for planar graphs: V - E + F = 2
Maximum edges in simple graph: n(n-1)/2 (undirected), n(n-1) (directed)
B-Tree height: O(log_m(n)) where m = order
Hash table load factor: α = n/m
Master Theorem: T(n) = aT(n/b) + f(n)

13. Exam Tips

Always mention time AND space complexity in answers
Know the difference between array and linked list trade-offs
BST vs AVL: AVL guarantees O(log n) by balancing
BFS uses Queue, DFS uses Stack — frequently asked
Heap sort is NOT stable but is in-place
Merge sort is stable but requires O(n) extra space
Quick sort worst case O(n²) occurs when pivot is always min/max
Hash table average case is O(1) but worst case is O(n)
Trie search time depends on string length, not number of strings
Know all tree traversal orders and their applications

Practice Questions

8 MCQs for Data Structures and Applications with detailed explanations.

Q1. Consider the following about is a collection of elements of the same data type stored in ...

A. It applies only to sequential processing models
B. It is not applicable in distributed systems
C. is a collection of elements of the same data type stored in contiguous memory locations.

Key Properties

-
- D. It requires a minimum of O(n²) time complexity

✅ Correct Answer: Option C

Explanation:
The correct answer is Option C — is a collection of elements of the same data type stored in contiguous memory locations.

Key Properties

This is a standard concept in Data Structures and Applications. The answer follows from the fundamental definitions and properties covered in this topic.

Why other options are incorrect:
- Option A (It applies only to sequential processing models...) — This does not correctly answer the question as it either misstates the concept or refers to a different context.
- Option B (It is not applicable in distributed systems...) — This does not correctly answer the question as it either misstates the concept or refers to a different context.
- Option D (It requires a minimum of O(n²) time complexity...) — This does not correctly answer the question as it either misstates the concept or refers to a different context.

Q2. Consider the following about is a linear collection of data elements (nodes) where each n...

A. is a linear collection of data elements (nodes) where each node points to the next.

Types of Linked Lists

| Type |
- B. It is not applicable in distributed systems
- C. It requires a minimum of O(n²) time complexity
- D. It applies only to sequential processing models

✅ Correct Answer: Option A

Explanation:
The correct answer is Option A — is a linear collection of data elements (nodes) where each node points to the next.

Types of Linked Lists

| Type |.

This is a standard concept in Data Structures and Applications. The answer follows from the fundamental definitions and properties covered in this topic.

Why other options are incorrect:
- Option B (It is not applicable in distributed systems...) — This does not correctly answer the question as it either misstates the concept or refers to a different context.
- Option C (It requires a minimum of O(n²) time complexity...) — This does not correctly answer the question as it either misstates the concept or refers to a different context.
- Option D (It applies only to sequential processing models...) — This does not correctly answer the question as it either misstates the concept or refers to a different context.

Q3. Consider the following about If '(' → push to stack

If ')' → pop and output until '(' ...
A. If '(' → push to stack
If ')' → pop and output until '(' is found
B. It applies only to sequential processing models
C. It is not applicable in distributed systems
D. It requires a minimum of O(n²) time complexity

✅ Correct Answer: Option A

Explanation:
The correct answer is Option A — If '(' → push to stack
4. If ')' → pop and output until '(' is found.

This is a standard concept in Data Structures and Applications. The answer follows from the fundamental definitions and properties covered in this topic.

Why other options are incorrect:
- Option B (It applies only to sequential processing models...) — This does not correctly answer the question as it either misstates the concept or refers to a different context.
- Option C (It is not applicable in distributed systems...) — This does not correctly answer the question as it either misstates the concept or refers to a different context.
- Option D (It requires a minimum of O(n²) time complexity...) — This does not correctly answer the question as it either misstates the concept or refers to a different context.

Q4. Consider the following about Operations

enqueue(x): Insert element at rear — O(1)
-...
A. It is not applicable in distributed systems
B. Operations
enqueue(x): Insert element at rear — O(1)
dequeue(): Remove element from front — O(1)
**front()
C. It requires a minimum of O(n²) time complexity
D. It applies only to sequential processing models

✅ Correct Answer: Option B

Explanation:
The correct answer is Option B — Operations
- enqueue(x): Insert element at rear — O(1)
- dequeue(): Remove element from front — O(1)
- **front().

This is a standard concept in Data Structures and Applications. The answer follows from the fundamental definitions and properties covered in this topic.

Why other options are incorrect:
- Option A (It is not applicable in distributed systems...) — This does not correctly answer the question as it either misstates the concept or refers to a different context.
- Option C (It requires a minimum of O(n²) time complexity...) — This does not correctly answer the question as it either misstates the concept or refers to a different context.
- Option D (It applies only to sequential processing models...) — This does not correctly answer the question as it either misstates the concept or refers to a different context.

Q5. Consider the following about Memory address of element: `Address = Base_Address + (index ...

A. It requires a minimum of O(n²) time complexity
B. Memory address of element: Address = Base_Address + (index × size_of_element)
C. It is not applicable in distributed systems
D. It applies only to sequential processing models

✅ Correct Answer: Option B

Explanation:
The correct answer is Option B — Memory address of element: Address = Base_Address + (index × size_of_element).

This is a standard concept in Data Structures and Applications. The answer follows from the fundamental definitions and properties covered in this topic.

Why other options are incorrect:
- Option A (It requires a minimum of O(n²) time complexity...) — This does not correctly answer the question as it either misstates the concept or refers to a different context.
- Option C (It is not applicable in distributed systems...) — This does not correctly answer the question as it either misstates the concept or refers to a different context.
- Option D (It applies only to sequential processing models...) — This does not correctly answer the question as it either misstates the concept or refers to a different context.

Q6. Consider the following about dimensional:** Simple linear array...

A. It is not applicable in distributed systems
B. dimensional:** Simple linear array
C. It requires a minimum of O(n²) time complexity
D. It applies only to sequential processing models

✅ Correct Answer: Option B

Explanation:
The correct answer is Option B — dimensional:** Simple linear array.

This is a standard concept in Data Structures and Applications. The answer follows from the fundamental definitions and properties covered in this topic.

Q7. Consider the following about dimensional:** Matrix representation (rows × columns)...

A. It applies only to sequential processing models
B. dimensional:** Matrix representation (rows × columns)
C. It is not applicable in distributed systems
D. It requires a minimum of O(n²) time complexity

✅ Correct Answer: Option B

Explanation:
The correct answer is Option B — dimensional:** Matrix representation (rows × columns).

This is a standard concept in Data Structures and Applications. The answer follows from the fundamental definitions and properties covered in this topic.

Why other options are incorrect:
- Option A (It applies only to sequential processing models...) — This does not correctly answer the question as it either misstates the concept or refers to a different context.
- Option C (It is not applicable in distributed systems...) — This does not correctly answer the question as it either misstates the concept or refers to a different context.
- Option D (It requires a minimum of O(n²) time complexity...) — This does not correctly answer the question as it either misstates the concept or refers to a different context.

Q8. Consider the following about dimensional:** Arrays of arrays...

A. It requires a minimum of O(n²) time complexity
B. dimensional:** Arrays of arrays
C. It applies only to sequential processing models
D. It is not applicable in distributed systems

✅ Correct Answer: Option B

Explanation:
The correct answer is Option B — dimensional:** Arrays of arrays.

This is a standard concept in Data Structures and Applications. The answer follows from the fundamental definitions and properties covered in this topic.

Why other options are incorrect:
- Option A (It requires a minimum of O(n²) time complexity...) — This does not correctly answer the question as it either misstates the concept or refers to a different context.
- Option C (It applies only to sequential processing models...) — This does not correctly answer the question as it either misstates the concept or refers to a different context.
- Option D (It is not applicable in distributed systems...) — This does not correctly answer the question as it either misstates the concept or refers to a different context.