Data Structures and Applications

Comprehensive Theory Notes for CBDT Assistant Director (Systems) Exam


1. Introduction to Data Structures

A data structure is a systematic way of organizing, storing, and managing data so that it can be accessed and modified efficiently. Data structures are the backbone of computer science and software engineering.

Classification of Data Structures


2. Arrays

An array is a collection of elements of the same data type stored in contiguous memory locations.

Key Properties

  • Fixed size (static allocation)
  • Random access via index: O(1)
  • Homogeneous elements
  • Memory address of element: Address = Base_Address + (index × size_of_element)

Operations and Complexities

Operation Best Average Worst
Access O(1) O(1) O(1)
Search O(1) O(n) O(n)
Insertion O(1) O(n) O(n)
Deletion O(1) O(n) O(n)

Types of Arrays

  • One-dimensional: Simple linear array
  • Two-dimensional: Matrix representation (rows × columns)
  • Multi-dimensional: Arrays of arrays

Applications

  • Implementing other data structures (stacks, queues, heaps)
  • Lookup tables and matrices in scientific computing
  • String storage (character arrays)

3. Linked Lists

A linked list is a linear collection of data elements (nodes) where each node points to the next.

Types of Linked Lists

Type Description
Singly Linked List Each node points to next node only
Doubly Linked List Each node has pointers to both next and previous
Circular Linked List Last node points back to first node
Doubly Circular Combination of doubly and circular

Operations and Complexities

Operation Singly Doubly
Insert at head O(1) O(1)
Insert at tail O(n) / O(1)* O(1)
Delete head O(1) O(1)
Search O(n) O(n)
Random Access O(n) O(n)

*O(1) if tail pointer is maintained

Advantages over Arrays

  • Dynamic size (no need to pre-allocate)
  • Efficient insertion/deletion at any position
  • No memory wastage

Disadvantages

  • No random access
  • Extra memory for pointers
  • Poor cache locality

Applications

  • Dynamic memory allocation
  • Implementing stacks, queues, and hash tables
  • Polynomial arithmetic
  • Undo functionality in applications

4. Stacks

A Stack is a linear data structure following LIFO (Last In, First Out) principle.

Operations

  • push(x): Insert element x at top — O(1)
  • pop(): Remove and return top element — O(1)
  • peek()/top(): Return top element without removing — O(1)
  • isEmpty(): Check if stack is empty — O(1)

Implementations

  • Array-based: Fixed or dynamic size; may overflow
  • Linked list-based: No size limit; uses extra pointer memory

Key Concepts

  • Overflow: Push on a full stack
  • Underflow: Pop from an empty stack
  • Top pointer: Always points to the most recently added element

Applications

  • Expression evaluation and conversion (infix ↔ postfix ↔ prefix)
  • Parenthesis matching and syntax parsing
  • Function call management (call stack / activation records)
  • Undo/Redo operations in editors
  • Backtracking algorithms (maze solving, N-Queens)
  • Tower of Hanoi problem
  • DFS (Depth-First Search) implementation

Infix to Postfix Conversion (Algorithm)

  1. Scan the infix expression left to right
  2. If operand → add to output
  3. If '(' → push to stack
  4. If ')' → pop and output until '(' is found
  5. If operator → pop operators with higher or equal precedence, then push current operator
  6. After scanning, pop all remaining operators

5. Queues

A Queue is a linear data structure following FIFO (First In, First Out) principle.

Operations

  • enqueue(x): Insert element at rear — O(1)
  • dequeue(): Remove element from front — O(1)
  • front(): Return front element — O(1)
  • isEmpty(): Check if queue is empty — O(1)

Types of Queues

Type Description
Simple Queue Standard FIFO
Circular Queue Last position connects to first; solves memory wastage
Priority Queue Elements served by priority, not arrival order
Deque (Double-ended) Insert/delete from both ends
Input-restricted Deque Insert at one end, delete from both
Output-restricted Deque Insert at both ends, delete from one

Circular Queue

  • Solves the problem of unused space in array-based queues
  • Full condition: (REAR + 1) % n == FRONT
  • Empty condition: REAR == FRONT
  • Next position: (REAR + 1) % n

Applications

  • CPU scheduling (Round Robin, FCFS)
  • Disk scheduling algorithms
  • BFS (Breadth-First Search) traversal
  • Spooling in print management
  • Message queues in distributed systems
  • Handling requests in web servers

6. Trees

A Tree is a hierarchical, non-linear data structure consisting of nodes with a root and subtrees.

Terminology

  • Root: Topmost node (no parent)
  • Parent/Child: Direct ancestor/descendant relationship
  • Leaf: Node with no children
  • Degree: Number of children of a node
  • Height: Longest path from node to leaf
  • Level: Root is at level 0; children at level 1, etc.
  • Forest: Collection of disjoint trees

Binary Tree

Each node has at most 2 children (left and right).

Maximum nodes at level i: 2^i
Maximum nodes in tree of height h: 2^(h+1) - 1
Minimum nodes in tree of height h: h + 1

Types of Binary Trees

Type Description
Full/Strict Every node has 0 or 2 children
Complete All levels filled except possibly last; filled left to right
Perfect All internal nodes have 2 children; all leaves at same level
Degenerate/Pathological Each parent has exactly one child (like linked list)
Balanced Height difference of left and right subtrees ≤ 1

Tree Traversals

Traversal Order Use Case
Inorder Left → Root → Right BST gives sorted order
Preorder Root → Left → Right Create copy, prefix expression
Postorder Left → Right → Root Delete tree, postfix expression
Level-order Level by level (BFS) Level-wise processing

Binary Search Tree (BST)

  • Left subtree contains only nodes with values less than root
  • Right subtree contains only values greater than root
  • Both subtrees are also BSTs
Operation Average Worst (skewed)
Search O(log n) O(n)
Insert O(log n) O(n)
Delete O(log n) O(n)

BST Deletion Cases:
1. Leaf node: Simply remove
2. One child: Replace with child
3. Two children: Replace with inorder successor (smallest in right subtree) or inorder predecessor (largest in left subtree)

AVL Trees (Adelson-Vland and Landis)

Self-balancing BST where the balance factor (height of left subtree - height of right subtree) is in {-1, 0, 1}.

Rotations for Rebalancing:
| Case | Condition | Rotation |
|------|-----------|----------|
| LL (Left-Left) | Left-heavy, left child left-heavy | Right Rotation |
| RR (Right-Right) | Right-heavy, right child right-heavy | Left Rotation |
| LR (Left-Right) | Left-heavy, left child right-heavy | Left-Right Rotation |
| RL (Right-Left) | Right-heavy, right child left-heavy | Right-Left Rotation |

Guaranteed height: h ≤ 1.44 log₂(n) → all operations O(log n)

B-Trees

Self-balancing search tree designed for disk-based storage (databases, file systems).

Properties of B-Tree of order m:
- Each node has at most m children
- Each non-root node has at least ⌈m/2⌉ children
- Root has at least 2 children (if not leaf)
- All leaves at same level
- A non-leaf node with k children contains k-1 keys

B+ Trees

Variant of B-tree where all data pointers are at leaf level, and leaves are linked.

Advantages over B-Tree:
- More keys per node (smaller pointers at internal nodes)
- Sequential access via linked leaves
- Better for range queries

Used in: Database indexing (MySQL, PostgreSQL), file systems (NTFS, ext4)

Heaps

A heap is a complete binary tree satisfying the heap property.

Type Property
Max-Heap Parent ≥ Children (root is maximum)
Min-Heap Parent ≤ Children (root is minimum)

Array representation:
- Parent of i: (i-1)/2
- Left child of i: 2i + 1
- Right child of i: 2i + 2

Operation Time Complexity
Insert O(log n)
Delete root O(log n)
Find min/max O(1)
Build heap O(n)
Heapify O(log n)

Applications: Priority queues, Heap sort, Dijkstra's algorithm, Prim's algorithm

Tries (Prefix Trees)

A trie is a tree-like data structure for storing strings where each node represents a character.

Properties:
- Root is empty
- Each path from root to a marked node represents a string
- Common prefixes share nodes

Operation Time Complexity
Insert O(L) where L = length of string
Search O(L)
Delete O(L)

Space: O(N × L × A) where N = number of strings, L = average length, A = alphabet size

Applications: Autocomplete, spell checkers, IP routing (longest prefix matching), word games


7. Graphs

A Graph G = (V, E) consists of a set of vertices (V) and edges (E).

Graph Terminology

  • Directed Graph (Digraph): Edges have direction
  • Undirected Graph: Edges are bidirectional
  • Weighted Graph: Edges have weights/costs
  • Simple Graph: No self-loops or multiple edges
  • Complete Graph: Every pair of vertices connected; n(n-1)/2 edges
  • Connected Graph: Path exists between every pair of vertices
  • Strongly Connected: Path exists in both directions (directed)
  • Bipartite Graph: Vertices divided into two sets; edges only between sets
  • Planar Graph: Can be drawn without edge crossings
  • Tree: Connected, acyclic graph with n-1 edges
  • DAG: Directed Acyclic Graph

Graph Representations

Representation Space Edge Check Add Edge Add Vertex
Adjacency Matrix O(V²) O(1) O(1) O(V²)
Adjacency List O(V + E) O(V) O(1) O(1)
Incidence Matrix O(V × E) O(E) O(E) O(V × E)

Graph Traversals

  • Uses Queue
  • Explores level by level
  • Time: O(V + E) with adjacency list, O(V²) with matrix
  • Space: O(V)
  • Applications: Shortest path (unweighted), level-order traversal, finding connected components
  • Uses Stack (or recursion)
  • Explores as deep as possible before backtracking
  • Time: O(V + E) with adjacency list, O(V²) with matrix
  • Space: O(V)
  • Applications: Topological sort, cycle detection, strongly connected components, path finding

Key Graph Algorithms

Algorithm Purpose Time Complexity
Dijkstra Single-source shortest path (non-negative weights) O((V+E) log V) with min-heap
Bellman-Ford Single-source shortest path (handles negative weights) O(V × E)
Floyd-Warshall All-pairs shortest path O(V³)
Kruskal Minimum Spanning Tree O(E log E)
Prim Minimum Spanning Tree O((V+E) log V)
Topological Sort Linear ordering of DAG O(V + E)
Kosaraju/Tarjan Strongly Connected Components O(V + E)

8. Hashing

Hashing is a technique to map data to a fixed-size table using a hash function.

Hash Function

h(key) = key % table_size (division method)

Collision Resolution Techniques

Technique Description Avg Search Worst Search
Chaining Each bucket holds a linked list O(1 + α) O(n)
Linear Probing h(k, i) = (h'(k) + i) % m O(1/(1-α)) O(n)
Quadratic Probing h(k, i) = (h'(k) + c₁i + c₂i²) Better than linear O(n)
Double Hashing h(k, i) = (h₁(k) + i×h₂(k)) % m Best distribution O(n)

Where α (load factor) = n/m (n = elements, m = table size)

Good Hash Function Properties

  • Minimizes collisions
  • Uniform distribution
  • Easy to compute
  • Uses all input data

Applications

  • Database indexing
  • Symbol tables in compilers
  • Password storage (with cryptographic hash)
  • Caching (hash maps)
  • Deduplication

9. Sorting Algorithms

Comparison-Based Sorting

Algorithm Best Average Worst Space Stable?
Bubble Sort O(n) O(n²) O(n²) O(1) Yes
Selection Sort O(n²) O(n²) O(n²) O(1) No
Insertion Sort O(n) O(n²) O(n²) O(1) Yes
Merge Sort O(n log n) O(n log n) O(n log n) O(n) Yes
Quick Sort O(n log n) O(n log n) O(n²) O(log n) No
Heap Sort O(n log n) O(n log n) O(n log n) O(1) No
Tim Sort O(n) O(n log n) O(n log n) O(n) Yes

Non-Comparison Based Sorting

Algorithm Best Average Worst Space Notes
Counting Sort O(n+k) O(n+k) O(n+k) O(k) k = range of input
Radix Sort O(d×(n+k)) O(d×(n+k)) O(d×(n+k)) O(n+k) d = digits, k = base
Bucket Sort O(n+k) O(n+k) O(n²) O(n) k = number of buckets

Key Sorting Concepts

  • Stable sort: Maintains relative order of equal elements
  • In-place sort: Uses O(1) extra space
  • Lower bound for comparison sort: Ω(n log n)
  • Quick Sort is generally fastest in practice (good cache performance)
  • Merge Sort is preferred for linked lists and external sorting
  • Heap Sort guarantees O(n log n) with O(1) space

10. Searching Algorithms

  • Sequentially check each element
  • Time: O(n)
  • Works on unsorted data
  • Requires sorted array
  • Repeatedly divide search space in half
  • Time: O(log n)
  • Space: O(1) iterative, O(log n) recursive

Comparison Table

Algorithm Best Average Worst Requirement
Linear Search O(1) O(n) O(n) None
Binary Search O(1) O(log n) O(log n) Sorted array
Interpolation O(1) O(log log n) O(n) Sorted + uniform

11. Complexity Summary Table

Data Structure Access Search Insert Delete Space
Array O(1) O(n) O(n) O(n) O(n)
Sorted Array O(1) O(log n) O(n) O(n) O(n)
Stack O(n) O(n) O(1) O(1) O(n)
Queue O(n) O(n) O(1) O(1) O(n)
Singly Linked List O(n) O(n) O(1)* O(1)* O(n)
Doubly Linked List O(n) O(n) O(1) O(1) O(n)
BST (balanced) O(log n) O(log n) O(log n) O(log n) O(n)
BST (worst) O(n) O(n) O(n) O(n) O(n)
AVL Tree O(log n) O(log n) O(log n) O(log n) O(n)
Hash Table N/A O(1) O(1) O(1) O(n)
Heap O(n) O(n) O(log n) O(log n) O(n)
Trie O(L) O(L) O(L) O(L) O(N×L)

*At known position


12. Key Formulas and Theorems

  1. Maximum nodes in binary tree of height h: 2^(h+1) - 1
  2. Minimum nodes in binary tree of height h: h + 1
  3. Height of complete binary tree with n nodes: ⌊log₂(n)⌋
  4. Number of leaf nodes in full binary tree: (n + 1) / 2
  5. Handshaking Lemma: Sum of all degrees = 2 × |E|
  6. Euler's formula for planar graphs: V - E + F = 2
  7. Maximum edges in simple graph: n(n-1)/2 (undirected), n(n-1) (directed)
  8. B-Tree height: O(log_m(n)) where m = order
  9. Hash table load factor: α = n/m
  10. Master Theorem: T(n) = aT(n/b) + f(n)

13. Exam Tips


Practice Questions

8 MCQs for Data Structures and Applications with detailed explanations.

Q1. Consider the following about is a collection of elements of the same data type stored in ...

Key Properties

-
- D. It requires a minimum of O(n²) time complexity

✅ Correct Answer: Option C

Explanation:
The correct answer is Option C — is a collection of elements of the same data type stored in contiguous memory locations.

Key Properties

-.

This is a standard concept in Data Structures and Applications. The answer follows from the fundamental definitions and properties covered in this topic.

Why other options are incorrect:
- Option A (It applies only to sequential processing models...) — This does not correctly answer the question as it either misstates the concept or refers to a different context.
- Option B (It is not applicable in distributed systems...) — This does not correctly answer the question as it either misstates the concept or refers to a different context.
- Option D (It requires a minimum of O(n²) time complexity...) — This does not correctly answer the question as it either misstates the concept or refers to a different context.


Q2. Consider the following about is a linear collection of data elements (nodes) where each n...

Types of Linked Lists

| Type |
- B. It is not applicable in distributed systems
- C. It requires a minimum of O(n²) time complexity
- D. It applies only to sequential processing models

✅ Correct Answer: Option A

Explanation:
The correct answer is Option A — is a linear collection of data elements (nodes) where each node points to the next.

Types of Linked Lists

| Type |.

This is a standard concept in Data Structures and Applications. The answer follows from the fundamental definitions and properties covered in this topic.

Why other options are incorrect:
- Option B (It is not applicable in distributed systems...) — This does not correctly answer the question as it either misstates the concept or refers to a different context.
- Option C (It requires a minimum of O(n²) time complexity...) — This does not correctly answer the question as it either misstates the concept or refers to a different context.
- Option D (It applies only to sequential processing models...) — This does not correctly answer the question as it either misstates the concept or refers to a different context.


Q3. Consider the following about If '(' → push to stack

  1. If ')' → pop and output until '(' ...

  2. A. If '(' → push to stack

  3. If ')' → pop and output until '(' is found
  4. B. It applies only to sequential processing models
  5. C. It is not applicable in distributed systems
  6. D. It requires a minimum of O(n²) time complexity

✅ Correct Answer: Option A

Explanation:
The correct answer is Option A — If '(' → push to stack
4. If ')' → pop and output until '(' is found.

This is a standard concept in Data Structures and Applications. The answer follows from the fundamental definitions and properties covered in this topic.

Why other options are incorrect:
- Option B (It applies only to sequential processing models...) — This does not correctly answer the question as it either misstates the concept or refers to a different context.
- Option C (It is not applicable in distributed systems...) — This does not correctly answer the question as it either misstates the concept or refers to a different context.
- Option D (It requires a minimum of O(n²) time complexity...) — This does not correctly answer the question as it either misstates the concept or refers to a different context.


Q4. Consider the following about Operations

✅ Correct Answer: Option B

Explanation:
The correct answer is Option B — Operations
- enqueue(x): Insert element at rear — O(1)
- dequeue(): Remove element from front — O(1)
- **front().

This is a standard concept in Data Structures and Applications. The answer follows from the fundamental definitions and properties covered in this topic.

Why other options are incorrect:
- Option A (It is not applicable in distributed systems...) — This does not correctly answer the question as it either misstates the concept or refers to a different context.
- Option C (It requires a minimum of O(n²) time complexity...) — This does not correctly answer the question as it either misstates the concept or refers to a different context.
- Option D (It applies only to sequential processing models...) — This does not correctly answer the question as it either misstates the concept or refers to a different context.


Q5. Consider the following about Memory address of element: `Address = Base_Address + (index ...

✅ Correct Answer: Option B

Explanation:
The correct answer is Option B — Memory address of element: Address = Base_Address + (index × size_of_element).

This is a standard concept in Data Structures and Applications. The answer follows from the fundamental definitions and properties covered in this topic.

Why other options are incorrect:
- Option A (It requires a minimum of O(n²) time complexity...) — This does not correctly answer the question as it either misstates the concept or refers to a different context.
- Option C (It is not applicable in distributed systems...) — This does not correctly answer the question as it either misstates the concept or refers to a different context.
- Option D (It applies only to sequential processing models...) — This does not correctly answer the question as it either misstates the concept or refers to a different context.


Q6. Consider the following about dimensional:** Simple linear array...

✅ Correct Answer: Option B

Explanation:
The correct answer is Option B — dimensional:** Simple linear array.

This is a standard concept in Data Structures and Applications. The answer follows from the fundamental definitions and properties covered in this topic.

Why other options are incorrect:
- Option A (It is not applicable in distributed systems...) — This does not correctly answer the question as it either misstates the concept or refers to a different context.
- Option C (It requires a minimum of O(n²) time complexity...) — This does not correctly answer the question as it either misstates the concept or refers to a different context.
- Option D (It applies only to sequential processing models...) — This does not correctly answer the question as it either misstates the concept or refers to a different context.


Q7. Consider the following about dimensional:** Matrix representation (rows × columns)...

✅ Correct Answer: Option B

Explanation:
The correct answer is Option B — dimensional:** Matrix representation (rows × columns).

This is a standard concept in Data Structures and Applications. The answer follows from the fundamental definitions and properties covered in this topic.

Why other options are incorrect:
- Option A (It applies only to sequential processing models...) — This does not correctly answer the question as it either misstates the concept or refers to a different context.
- Option C (It is not applicable in distributed systems...) — This does not correctly answer the question as it either misstates the concept or refers to a different context.
- Option D (It requires a minimum of O(n²) time complexity...) — This does not correctly answer the question as it either misstates the concept or refers to a different context.


Q8. Consider the following about dimensional:** Arrays of arrays...

✅ Correct Answer: Option B

Explanation:
The correct answer is Option B — dimensional:** Arrays of arrays.

This is a standard concept in Data Structures and Applications. The answer follows from the fundamental definitions and properties covered in this topic.

Why other options are incorrect:
- Option A (It requires a minimum of O(n²) time complexity...) — This does not correctly answer the question as it either misstates the concept or refers to a different context.
- Option C (It applies only to sequential processing models...) — This does not correctly answer the question as it either misstates the concept or refers to a different context.
- Option D (It is not applicable in distributed systems...) — This does not correctly answer the question as it either misstates the concept or refers to a different context.