Artificial Intelligence and Machine Learning

Table of Contents

  1. AI Fundamentals
  2. Intelligent Agents
  3. Search Techniques
  4. Knowledge Representation
  5. Expert Systems
  6. Machine Learning Types
  7. Key ML Algorithms
  8. Bias-Variance Tradeoff
  9. Overfitting and Underfitting
  10. Cross-Validation
  11. Evaluation Metrics
  12. Ensemble Methods
  13. Feature Engineering

AI Fundamentals

What is AI?

Artificial Intelligence is the simulation of human intelligence processes by computer systems. These processes include learning (acquiring information and rules), reasoning (using rules to reach conclusions), and self-correction.

AI Approaches

Approach Description Example
Symbolic AI (Good Old-FAI) Rule-based, explicit knowledge representation Expert systems, logic programming
Connectionist AI Learning from data via neural networks Deep learning
Evolutionary AI Inspired by biological evolution Genetic algorithms
Statistical AI Probabilistic reasoning and learning Bayesian networks

AI-Levels

Turing Test

Proposed by Alan Turing (1950) — a machine passes if a human evaluator cannot distinguish its responses from a human's. The Chinese Room Argument (Searle) claims that passing the Turing Test doesn't imply true understanding.

Key AI Applications


Intelligent Agents

An agent is anything that perceives its environment through sensors and acts upon it through actuators.

Agent Types

Simple Reflex Agent

Model-Based Reflex Agent

Goal-Based Agent

Utility-Based Agent

Learning Agent

PEAS Framework

Describes an agent's task environment:
- Performance measure
- Environment
- Actuators
- Sensors

Example — Self-driving Car:
- Performance: Safe, fast, legal, comfortable travel
- Environment: Roads, traffic, pedestrians, weather
- Actuators: Steering, accelerator, brake, horn
- Sensors: Cameras, GPS, LIDAR, speedometer

Environment Properties

Property Description
Fully Observable Agent can see entire state
Deterministic Actions have predictable outcomes
Static Environment doesn't change while agent deliberates
Discrete Finite set of percepts and actions
Single Agent No other agents competing

Search Techniques

Search is a fundamental AI technique for finding sequences of actions to achieve goals.

Search Problem Formulation

  1. State Space: All possible states
  2. Initial State: Starting state
  3. Goal Test: Determines if a state is the goal
  4. Successor Function: Defines possible actions and resulting states
  5. Path Cost: Cost function for evaluating solutions

Breadth-First Search (BFS)

Depth-First Search (DFS)

Depth-Limited Search (DLS)

Iterative Deepening Search (IDS)

Uniform Cost Search (UCS)

Heuristic Function h(n)

Estimates cost from node n to nearest goal. Must be admissible (never overestimates) for A* optimality.

Properties of Heuristics

Hill Climbing

Simulated Annealing

Genetic Algorithm (GA)

Search Algorithm Comparison

Algorithm Complete Optimal Time Space Type
BFS Yes Yes (uniform) O(b^d) O(b^d) Uninformed
DFS No No O(b^m) O(bm) Uninformed
UCS Yes Yes O(b^(1+C*/ε)) O(b^(1+C*/ε)) Uninformed
Greedy No No O(b^m) O(b^m) Informed
A* Yes Yes* O(b^d) O(b^d) Informed
Hill Climb No No O(∞) worst O(1) Local
GA No No Varies Varies Local

Knowledge Representation

Approaches

Propositional Logic

First-Order Logic (Predicate Logic)

Semantic Network

Frames

Ontologies


Expert Systems

Architecture

  1. Knowledge Base: Facts and rules (IF-THEN) from domain experts
  2. Inference Engine: Applies rules to facts to derive conclusions
  3. Forward Chaining: Data-driven (start from facts, apply rules)
  4. Backward Chaining: Goal-driven (start from goal, find supporting facts)
  5. Working Memory: Current facts and intermediate results
  6. Explanation Facility: Explains reasoning process
  7. User Interface: Interaction with users

Advantages

Limitations


Machine Learning Types

Supervised Learning

Unsupervised Learning

Reinforcement Learning

Semi-Supervised Learning

Self-Supervised Learning


Key ML Algorithms

Linear Regression

Logistic Regression

Decision Trees

Random Forest

Support Vector Machine (SVM)

k-Nearest Neighbors (k-NN)

k-Means Clustering

Neural Networks Basics

Convolutional Neural Networks (CNN)

Recurrent Neural Networks (RNN)


Bias-Variance Tradeoff

Decomposition of Error

Total Error = Bias² + Variance + Irreducible Error

Component Description Cause
Bias Error from overly simplistic assumptions Underfitting (model too simple)
Variance Error from sensitivity to training data fluctuations Overfitting (model too complex)
Irreducible Error Noise inherent in data Cannot be reduced by any model

The Tradeoff

Managing the Tradeoff


Overfitting and Underfitting

Underfitting

Overfitting

Regularization Techniques

Technique Description
**L1 (Lasso) Adds λΣ
**L2 (Ridge) Adds λΣwᵢ² penalty; shrinks coefficients
**Elastic Net Combines L1 and L2
**Dropout Randomly disable neurons during training (NN)
**Early Stopping Stop training when validation error increases
**Data Augmentation Artificially increase training data

Cross-Validation

Purpose

Estimate model performance on unseen data and tune hyperparameters.

Types

k-Fold Cross-Validation

  1. Split data into k equal folds
  2. Train on k-1 folds, test on remaining fold
  3. Repeat k times (each fold used as test once)
  4. Average the k performance scores
  5. Common: k = 5 or k = 10
  6. Advantage: Every data point used for both training and testing

Stratified k-Fold

Leave-One-Out Cross-Validation (LOOCV)

Holdout Method


Evaluation Metrics

Classification Metrics

Confusion Matrix

Predicted Positive Predicted Negative
Actual Positive True Positive (TP) False Negative (FN)
Actual Negative False Positive (FP) True Negative (TN)

Key Metrics

Metric Formula Interpretation
Accuracy (TP+TN)/(TP+TN+FP+FN) Overall correctness
Precision TP/(TP+FP) Of predicted positives, how many are correct
Recall (Sensitivity) TP/(TP+FN) Of actual positives, how many detected
Specificity TN/(TN+FP) Of actual negatives, how many detected
F1 Score 2×(Precision×Recall)/(Precision+Recall) Harmonic mean of precision and recall

When to Use What?

ROC Curve and AUC

Regression Metrics

Metric Formula Interpretation
MAE (1/n)Σ|yᵢ - ŷᵢ| Average absolute error
MSE (1/n)Σ(yᵢ - ŷᵢ)² Average squared error (penalizes large errors)
RMSE √MSE Same units as target variable
R² (R-squared) 1 - (SS_res/SS_tot) Proportion of variance explained (0 to 1)
Adjusted R² 1 - [(1-R²)(n-1)/(n-p-1)] R² adjusted for number of predictors

Ensemble Methods

Ensemble methods combine multiple models to improve performance.

Bagging (Bootstrap Aggregating)

Boosting

AdaBoost (Adaptive Boosting)

  1. Initialize equal weights for all training samples
  2. Train weak learner (e.g., decision stump)
  3. Increase weights of misclassified samples
  4. Assign weight to learner based on accuracy
  5. Repeat; final prediction = weighted vote

Gradient Boosting

XGBoost

Stacking

Ensemble Comparison

Method Strategy Reduces Parallel?
Bagging Independent bootstrap samples Variance Yes
Boosting Sequential error correction Bias No
Stacking Meta-learner on base models Both Partially

Feature Engineering

Feature engineering is the process of creating/transforming features to improve model performance.

Feature Selection Methods

Method Description
Filter Statistical measures (correlation, chi-square, mutual information)
Wrapper Use model performance to select features (forward selection, backward elimination)
Embedded Selection during model training (Lasso, tree-based importance)

Feature Transformation

Technique Description
Normalization Scale to [0,1]: x' = (x - min)/(max - min)
Standardization Scale to mean=0, std=1: x' = (x - μ)/σ
Log Transform Reduces skewness in right-skewed data
Binning Convert continuous to categorical
One-Hot Encoding Convert categorical to binary columns
Label Encoding Convert categories to integers (for ordinal data)
PCA Principal Component Analysis — reduce dimensions while preserving variance

Handling Missing Data

Handling Imbalanced Data


Key Formulas Summary

Concept Formula
A* Search f(n) = g(n) + h(n)
Sigmoid σ(z) = 1/(1 + e^(-z))
MSE (1/n) Σ(yᵢ - ŷᵢ)²
Entropy H(S) = -Σ pᵢ log₂(pᵢ)
Gini Index Gini = 1 - Σ pᵢ²
Precision TP / (TP + FP)
Recall TP / (TP + FN)
F1 Score 2 × (P × R) / (P + R)
1 - (SS_res / SS_tot)
Simulated Annealing P(accept) = e^(-ΔE/T)
Error Decomposition Bias² + Variance + Irreducible

Exam Tips

  1. Search Algorithms: Know BFS vs DFS vs A* — completeness, optimality, complexity
  2. A* Optimality: Understand admissibility and consistency conditions
  3. ML Types: Clearly distinguish supervised, unsupervised, and reinforcement learning
  4. SVM: Understand margin, support vectors, and kernel trick
  5. Bias-Variance: Know the tradeoff and how to manage it
  6. Evaluation Metrics: Know when to use precision vs recall vs F1
  7. Ensemble Methods: Bagging vs Boosting — key differences
  8. Neural Networks: Understand backpropagation, activation functions, CNN vs RNN
  9. Decision Trees: Know splitting criteria (entropy, Gini, information gain)
  10. Cross-Validation: Understand k-fold and stratified k-fold

Practice Questions

10 MCQs for Artificial Intelligence and Machine Learning with detailed explanations.

Q1. Which of the following best describes - Time Complexity: O(b^d) — overhead?

✅ Correct Answer: Option B

Explanation:
The correct answer is Option B — negligible.

This concept is covered under Artificial Intelligence and Machine Learning in the CBDT Assistant Director Systems syllabus. The answer is established through standard definitions and widely accepted principles in the field.

Why other options are incorrect:
- Option A — This option is factually incorrect or describes a concept from a different domain, making it an invalid choice for this question.
- Option C — This option is factually incorrect or describes a concept from a different domain, making it an invalid choice for this question.
- Option D — This option is factually incorrect or describes a concept from a different domain, making it an invalid choice for this question.


Q2. Regarding the following concept: '(using rules to reach conclusions), and...', which statement is correct?

✅ Correct Answer: Option D

Explanation:
The correct answer is Option D — (using rules to reach conclusions), and.

This concept is covered under Artificial Intelligence and Machine Learning in the CBDT Assistant Director Systems syllabus. The answer is established through standard definitions and widely accepted principles in the field.

Why other options are incorrect:
- Option A — This option is factually incorrect or describes a concept from a different domain, making it an invalid choice for this question.
- Option B — This option is factually incorrect or describes a concept from a different domain, making it an invalid choice for this question.
- Option C — This option is factually incorrect or describes a concept from a different domain, making it an invalid choice for this question.


Q3. Regarding the following concept: '(acquiring information and rules),...', which statement is correct?

✅ Correct Answer: Option D

Explanation:
The correct answer is Option D — (acquiring information and rules),.

This concept is covered under Artificial Intelligence and Machine Learning in the CBDT Assistant Director Systems syllabus. The answer is established through standard definitions and widely accepted principles in the field.

Why other options are incorrect:
- Option A — This option is factually incorrect or describes a concept from a different domain, making it an invalid choice for this question.
- Option B — This option is factually incorrect or describes a concept from a different domain, making it an invalid choice for this question.
- Option C — This option is factually incorrect or describes a concept from a different domain, making it an invalid choice for this question.


Q4. Regarding the following concept: '| Inspired by biological evolution | Genetic algorithms |

|...', which statement is correct?

✅ Correct Answer: Option C

Explanation:
The correct answer is Option C — | Inspired by biological evolution | Genetic algorithms |
|.

This concept is covered under Artificial Intelligence and Machine Learning in the CBDT Assistant Director Systems syllabus. The answer is established through standard definitions and widely accepted principles in the field.

Why other options are incorrect:
- Option A — This option is factually incorrect or describes a concept from a different domain, making it an invalid choice for this question.
- Option B — This option is factually incorrect or describes a concept from a different domain, making it an invalid choice for this question.
- Option D — This option is factually incorrect or describes a concept from a different domain, making it an invalid choice for this question.


Q5. Which of the following best describes - Classification: Output?

✅ Correct Answer: Option C

Explanation:
The correct answer is Option C — a category (e.g., spam/not spam).

This concept is covered under Artificial Intelligence and Machine Learning in the CBDT Assistant Director Systems syllabus. The answer is established through standard definitions and widely accepted principles in the field.

Why other options are incorrect:
- Option A — This option is factually incorrect or describes a concept from a different domain, making it an invalid choice for this question.
- Option B — This option is factually incorrect or describes a concept from a different domain, making it an invalid choice for this question.
- Option D — This option is factually incorrect or describes a concept from a different domain, making it an invalid choice for this question.


✅ Correct Answer: Option D

Explanation:
The correct answer is Option D — a fundamental AI technique for finding sequences of actions to achieve goals..

This concept is covered under Artificial Intelligence and Machine Learning in the CBDT Assistant Director Systems syllabus. The answer is established through standard definitions and widely accepted principles in the field.

Why other options are incorrect:
- Option A — This option is factually incorrect or describes a concept from a different domain, making it an invalid choice for this question.
- Option B — This option is factually incorrect or describes a concept from a different domain, making it an invalid choice for this question.
- Option C — This option is factually incorrect or describes a concept from a different domain, making it an invalid choice for this question.


Q7. Which of the following best describes - Regression: Output?

✅ Correct Answer: Option B

Explanation:
The correct answer is Option B — a continuous value (e.g., house price).

This concept is covered under Artificial Intelligence and Machine Learning in the CBDT Assistant Director Systems syllabus. The answer is established through standard definitions and widely accepted principles in the field.

Why other options are incorrect:
- Option A — This option is factually incorrect or describes a concept from a different domain, making it an invalid choice for this question.
- Option C — This option is factually incorrect or describes a concept from a different domain, making it an invalid choice for this question.
- Option D — This option is factually incorrect or describes a concept from a different domain, making it an invalid choice for this question.


Q8. Regarding the following concept: '| Rule-based, explicit knowledge representation | Expert systems, logic programm...', which statement is correct?

✅ Correct Answer: Option C

Explanation:
The correct answer is Option C — | Rule-based, explicit knowledge representation | Expert systems, logic programming |
|.

This concept is covered under Artificial Intelligence and Machine Learning in the CBDT Assistant Director Systems syllabus. The answer is established through standard definitions and widely accepted principles in the field.

Why other options are incorrect:
- Option A — This option is factually incorrect or describes a concept from a different domain, making it an invalid choice for this question.
- Option B — This option is factually incorrect or describes a concept from a different domain, making it an invalid choice for this question.
- Option D — This option is factually incorrect or describes a concept from a different domain, making it an invalid choice for this question.


Q9. Regarding the following concept: 'Human-level intelligence across all domains — theoretical

-...', which statement is correct?

✅ Correct Answer: Option D

Explanation:
The correct answer is Option D — Human-level intelligence across all domains — theoretical
-.

This concept is covered under Artificial Intelligence and Machine Learning in the CBDT Assistant Director Systems syllabus. The answer is established through standard definitions and widely accepted principles in the field.

Why other options are incorrect:
- Option A — This option is factually incorrect or describes a concept from a different domain, making it an invalid choice for this question.
- Option B — This option is factually incorrect or describes a concept from a different domain, making it an invalid choice for this question.
- Option C — This option is factually incorrect or describes a concept from a different domain, making it an invalid choice for this question.


Q10. Regarding the following concept: '.

AI Approaches

| Approach | Description | Example |
|----------|----------...', which statement is correct?

AI Approaches

Approach Description Example
- B. This is defined exclusively at the physical layer of system design
- C. This approach has been deprecated in all modern implementations
- D. This concept applies only to analog systems and not digital ones

✅ Correct Answer: Option A

Explanation:
The correct answer is Option A — .

AI Approaches

Approach Description Example
.

This concept is covered under Artificial Intelligence and Machine Learning in the CBDT Assistant Director Systems syllabus. The answer is established through standard definitions and widely accepted principles in the field.

Why other options are incorrect:
- Option B — This option is factually incorrect or describes a concept from a different domain, making it an invalid choice for this question.
- Option C — This option is factually incorrect or describes a concept from a different domain, making it an invalid choice for this question.
- Option D — This option is factually incorrect or describes a concept from a different domain, making it an invalid choice for this question.