Artificial Intelligence and Machine Learning
Table of Contents
- AI Fundamentals
- Intelligent Agents
- Search Techniques
- Knowledge Representation
- Expert Systems
- Machine Learning Types
- Key ML Algorithms
- Bias-Variance Tradeoff
- Overfitting and Underfitting
- Cross-Validation
- Evaluation Metrics
- Ensemble Methods
- Feature Engineering
AI Fundamentals
What is AI?
Artificial Intelligence is the simulation of human intelligence processes by computer systems. These processes include learning (acquiring information and rules), reasoning (using rules to reach conclusions), and self-correction.
AI Approaches
| Approach | Description | Example |
|---|---|---|
| Symbolic AI (Good Old-FAI) | Rule-based, explicit knowledge representation | Expert systems, logic programming |
| Connectionist AI | Learning from data via neural networks | Deep learning |
| Evolutionary AI | Inspired by biological evolution | Genetic algorithms |
| Statistical AI | Probabilistic reasoning and learning | Bayesian networks |
AI-Levels
- Narrow AI (Weak AI): Specialized in one task (e.g., Siri, Chess engines) — all current AI
- General AI (Strong AI): Human-level intelligence across all domains — theoretical
- Super AI: Surpasses human intelligence — purely hypothetical
Turing Test
Proposed by Alan Turing (1950) — a machine passes if a human evaluator cannot distinguish its responses from a human's. The Chinese Room Argument (Searle) claims that passing the Turing Test doesn't imply true understanding.
Key AI Applications
- Natural Language Processing (NLP)
- Computer Vision
- Robotics
- Speech Recognition
- Recommendation Systems
- Autonomous Vehicles
- Healthcare Diagnostics
- Fraud Detection
Intelligent Agents
An agent is anything that perceives its environment through sensors and acts upon it through actuators.
Agent Types
Simple Reflex Agent
- Based on condition-action rules (if-then)
- Only considers current percept
- Works only in fully observable environments
- Example: Thermostat (if temp > threshold → turn off)
Model-Based Reflex Agent
- Maintains an internal model of the world
- Can handle partially observable environments
- Tracks state based on percept history
Goal-Based Agent
- Uses goal information to decide actions
- Can plan sequences of actions to reach goals
- More flexible than reflex agents
Utility-Based Agent
- Maximizes a utility function (preference measure)
- Chooses actions that yield highest expected utility
- Handles trade-offs between conflicting goals
Learning Agent
- Has a learning element that improves performance over time
- Components: learning element, performance element, critic, problem generator
- Can operate in unknown environments
PEAS Framework
Describes an agent's task environment:
- Performance measure
- Environment
- Actuators
- Sensors
Example — Self-driving Car:
- Performance: Safe, fast, legal, comfortable travel
- Environment: Roads, traffic, pedestrians, weather
- Actuators: Steering, accelerator, brake, horn
- Sensors: Cameras, GPS, LIDAR, speedometer
Environment Properties
| Property | Description |
|---|---|
| Fully Observable | Agent can see entire state |
| Deterministic | Actions have predictable outcomes |
| Static | Environment doesn't change while agent deliberates |
| Discrete | Finite set of percepts and actions |
| Single Agent | No other agents competing |
Search Techniques
Search is a fundamental AI technique for finding sequences of actions to achieve goals.
Search Problem Formulation
- State Space: All possible states
- Initial State: Starting state
- Goal Test: Determines if a state is the goal
- Successor Function: Defines possible actions and resulting states
- Path Cost: Cost function for evaluating solutions
Uninformed (Blind) Search
Breadth-First Search (BFS)
- Explores all nodes at depth d before depth d+1
- Uses FIFO queue
- Complete: Yes (if solution exists)
- Optimal: Yes (if all actions have equal cost)
- Time Complexity: O(b^d)
- Space Complexity: O(b^d) — major drawback (stores all nodes)
Depth-First Search (DFS)
- Explores deepest node first
- Uses LIFO stack (or recursion)
- Complete: No (can get stuck in infinite loops)
- Optimal: No
- Time Complexity: O(b^m) where m = max depth
- Space Complexity: O(bm) — much better than BFS
Depth-Limited Search (DLS)
- DFS with a depth limit L
- Complete: Only if L ≥ d (depth of shallowest solution)
- Optimal: No
Iterative Deepening Search (IDS)
- Repeatedly runs DLS with increasing depth limits (0, 1, 2, ...)
- Complete: Yes
- Optimal: Yes (for uniform cost)
- Time Complexity: O(b^d) — overhead is negligible
- Space Complexity: O(bd)
- Best uninformed search for large state spaces
Uniform Cost Search (UCS)
- Expands node with lowest path cost g(n)
- Uses priority queue
- Complete: Yes
- Optimal: Yes
- Time/Space: O(b^(1 + ⌊C/ε⌋)) where C = optimal cost, ε = min edge cost
Informed (Heuristic) Search
Heuristic Function h(n)
Estimates cost from node n to nearest goal. Must be admissible (never overestimates) for A* optimality.
Greedy Best-First Search
- Expands node that appears closest to goal: f(n) = h(n)
- Not optimal, can be misled by heuristics
- Time/Space: O(b^m) but good heuristic helps
A* Search
- f(n) = g(n) + h(n) where g(n) = actual cost so far, h(n) = estimated remaining cost
- Complete: Yes
- Optimal: Yes, if h(n) is admissible (and consistent)
- Best known optimal search for pathfinding
- Disadvantage: Exponential space complexity
Properties of Heuristics
- Admissible: h(n) ≤ h*(n) (never overestimates true cost) — guarantees optimality
- Consistent (Monotonic): h(n) ≤ c(n, n') + h(n') — guarantees efficiency
- Dominance: h2(n) ≥ h1(n) for all n → h2 dominates h1 → h2 gives better pruning
Local Search
Hill Climbing
- Greedily moves to neighbor with best value
- Problems: Local maxima, plateaus, ridges
- Solutions:
- Stochastic HC: Choose among uphill moves randomly
- First-choice HC: Generate random successors
- Random-restart HC: Multiple random starts
- Memory: O(1) — very efficient
Simulated Annealing
- Concept: Borrowed from metallurgy — heating and slowly cooling
- Accepts worse moves with probability: P = e^(-ΔE/T) where T = temperature
- High T: Accepts almost anything (exploration)
- Low T: Only accepts improvements (exploitation)
- Cooling Schedule: T decreases over time (e.g., T = T × 0.95 each step)
- Guaranteed to find global optimum if cooling is slow enough
Genetic Algorithm (GA)
- Concept: Inspired by natural selection (evolution)
- Components:
- Population: Set of candidate solutions (chromosomes)
- Fitness Function: Evaluates quality of solutions
- Selection: Fitter individuals chosen for reproduction (roulette wheel, tournament)
- Crossover: Combine two parents to create offspring (single-point, multi-point, uniform)
- Mutation: Random small changes to maintain diversity
- Replacement: New generation replaces old
- Parameters: Population size, crossover rate, mutation rate, generations
- Use Cases: Optimization, scheduling, feature selection
Search Algorithm Comparison
| Algorithm | Complete | Optimal | Time | Space | Type |
|---|---|---|---|---|---|
| BFS | Yes | Yes (uniform) | O(b^d) | O(b^d) | Uninformed |
| DFS | No | No | O(b^m) | O(bm) | Uninformed |
| UCS | Yes | Yes | O(b^(1+C*/ε)) | O(b^(1+C*/ε)) | Uninformed |
| Greedy | No | No | O(b^m) | O(b^m) | Informed |
| A* | Yes | Yes* | O(b^d) | O(b^d) | Informed |
| Hill Climb | No | No | O(∞) worst | O(1) | Local |
| GA | No | No | Varies | Varies | Local |
Knowledge Representation
Approaches
Propositional Logic
- Uses propositions (True/False statements) and logical connectives
- Operators: AND (∧), OR (∨), NOT (¬), IMPLIES (→), BICONDITIONAL (↔)
- Limitation: Cannot express relations between objects; each fact must be stated individually
- Inference: Modus Ponens, resolution, truth tables
First-Order Logic (Predicate Logic)
- Extends propositional logic with objects, relations, and quantifiers
- ∀x: For all x (universal)
- ∃x: There exists x (existential)
- Example: ∀x (Cat(x) → Mammal(x)) — "All cats are mammals"
- More expressive than propositional logic
Semantic Network
- Graph-based representation
- Nodes: Objects/concepts
- Edges: Relationships (is-a, has, part-of)
- Inheritance: Subclass inherits properties from superclass
Frames
- Concept: Structured knowledge (like objects/classes)
- Slots: Attributes with default values
- Similar to: Object-oriented classes
Ontologies
- Formal specification of concepts and relationships in a domain
- Components: Classes, properties, instances, axioms
- Use Case: Semantic Web, knowledge graphs
Expert Systems
Architecture
- Knowledge Base: Facts and rules (IF-THEN) from domain experts
- Inference Engine: Applies rules to facts to derive conclusions
- Forward Chaining: Data-driven (start from facts, apply rules)
- Backward Chaining: Goal-driven (start from goal, find supporting facts)
- Working Memory: Current facts and intermediate results
- Explanation Facility: Explains reasoning process
- User Interface: Interaction with users
Advantages
- Consistent decisions
- Preserves expert knowledge
- Available 24/7
- Can handle complex domains
Limitations
- Knowledge Acquisition Bottleneck: Difficult to extract expert knowledge
- Cannot learn from experience
- Brittle (fails outside knowledge domain)
- Expensive to build and maintain
Machine Learning Types
Supervised Learning
- Input: Labeled data (input-output pairs)
- Goal: Learn mapping from inputs to outputs
- Types:
- Classification: Output is a category (e.g., spam/not spam)
- Regression: Output is a continuous value (e.g., house price)
- Algorithms: Linear regression, logistic regression, SVM, decision trees, k-NN, neural networks
Unsupervised Learning
- Input: Unlabeled data
- Goal: Discover hidden patterns/structure
- Types:
- Clustering: Group similar data (e.g., k-means, hierarchical)
- Dimensionality Reduction: Reduce features (e.g., PCA, t-SNE)
- Association: Find rules (e.g., Apriori algorithm)
- Algorithms: k-means, DBSCAN, PCA, autoencoders
Reinforcement Learning
- Concept: Agent learns by interacting with environment
- Components:
- Agent: Learner/decision maker
- Environment: What agent interacts with
- State: Current situation
- Action: What agent can do
- Reward: Feedback signal
- Policy: Strategy (state → action mapping)
- Key Concepts:
- Exploration vs Exploitation: Try new actions vs use known good ones
- Value Function: Expected cumulative reward from a state
- Q-Learning: Model-free RL; learns Q(s,a) values
- Bellman Equation: V(s) = max_a [R(s,a) + γ × V(s')]
- Algorithms: Q-learning, SARSA, Deep Q-Network (DQN), Policy Gradient
- Applications: Game playing (AlphaGo), robotics, recommendation
Semi-Supervised Learning
- Mix of labeled and unlabeled data
- Uses unlabeled data to improve learning
Self-Supervised Learning
- Generates labels from data itself (e.g., predicting masked words in BERT)
Key ML Algorithms
Linear Regression
- Goal: Predict continuous output using linear relationship: y = w₀ + w₁x₁ + w₂x₂ + ... + wₙxₙ
- Loss Function: Mean Squared Error (MSE) = (1/n) Σ(yᵢ - ŷᵢ)²
- Optimization: Gradient descent or normal equation (closed form)
- Assumptions: Linearity, independence, homoscedasticity, normality of residuals
- Regularization:
- Ridge (L2): Adds λΣwᵢ² penalty — shrinks coefficients
- Lasso (L1): Adds λΣ|wᵢ| penalty — can zero out coefficients (feature selection)
Logistic Regression
- Goal: Binary classification (despite "regression" in name)
- Function: P(y=1|x) = 1/(1 + e^(-z)) where z = w·x + b (sigmoid function)
- Decision Boundary: P ≥ 0.5 → class 1; P < 0.5 → class 0
- Loss Function: Binary Cross-Entropy = -[y·log(p) + (1-y)·log(1-p)]
- Optimization: Gradient descent
Decision Trees
- Structure: Tree with internal nodes (feature tests), branches (outcomes), leaves (predictions)
- Splitting Criteria:
- Information Gain (ID3): Uses entropy — H(S) = -Σ pᵢ log₂(pᵢ)
- Gain Ratio (C.4.5): Normalizes information gain
- Gini Index (CART): Gini = 1 - Σ pᵢ²
- Pruning: Pre-pruning (stop early) or post-pruning (grow full tree, then cut)
- Advantages: Interpretable, handles non-linear data, no feature scaling needed
- Disadvantages: Prone to overfitting, unstable (small data change → different tree)
Random Forest
- Concept: Ensemble of decision trees using bagging (bootstrap aggregating)
- Process:
- Create multiple bootstrap samples (random sampling with replacement)
- Train a decision tree on each sample
- At each split, consider only a random subset of features
- Aggregate predictions (majority vote for classification, average for regression)
- Advantages: Reduces overfitting, handles high dimensionality, robust to outliers
- Out-of-Bag (OOB) Error: Each tree tested on samples not in its bootstrap — built-in validation
Support Vector Machine (SVM)
- Goal: Find the maximum margin hyperplane separating classes
- Margin: Distance between hyperplane and nearest data points (support vectors)
- Optimization: Minimize ½||w||² subject to yᵢ(w·xᵢ + b) ≥ 1
- Soft Margin: Allows some misclassification (C parameter controls trade-off)
- Kernel Trick: Maps data to higher dimension for non-linear separation
- Linear Kernel: K(x,y) = x·y
- Polynomial Kernel: K(x,y) = (x·y + c)^d
- RBF/Gaussian Kernel: K(x,y) = exp(-γ||x-y||²) — most popular
- Sigmoid Kernel: K(x,y) = tanh(αx·y + c)
- Advantages: Effective in high dimensions, memory efficient (uses support vectors only)
- Disadvantages: Doesn't scale well to very large datasets, sensitive to noise
k-Nearest Neighbors (k-NN)
- Concept: Classify based on majority vote of k closest training examples
- Distance Metrics:
- Euclidean: √(Σ(xᵢ - yᵢ)²)
- Manhattan: Σ|xᵢ - yᵢ|
- Minkowski: (Σ|xᵢ - yᵢ|^p)^(1/p)
- Choosing k: Small k → sensitive to noise; Large k → smoother boundaries
- Advantages: Simple, no training phase, naturally handles multi-class
- Disadvantages: Slow prediction (lazy learner), sensitive to irrelevant features and scale
- Curse of Dimensionality: Distance becomes meaningless in very high dimensions
k-Means Clustering
- Algorithm:
- Initialize k centroids randomly
- Assign each point to nearest centroid
- Recalculate centroids as mean of assigned points
- Repeat until convergence (centroids don't change)
- Objective: Minimize within-cluster sum of squares (WCSS)
- Choosing k: Elbow method, silhouette score
- Advantages: Simple, fast O(n×k×i×d)
- Disadvantages: Sensitive to initialization, assumes spherical clusters, need to specify k
- k-Means++: Smart initialization — spreads initial centroids apart
Neural Networks Basics
- Structure:
- Input Layer: Receives features
- Hidden Layers: Process information (can be multiple)
- Output Layer: Produces prediction
- Neuron: z = Σ(wᵢxᵢ) + b; a = activation(z)
- Activation Functions:
- Sigmoid: σ(z) = 1/(1+e^(-z)) — range (0,1); vanishing gradient problem
- Tanh: range (-1,1); zero-centered but still vanishing gradient
- ReLU: max(0, z) — most popular; fast; can "die" (always output 0)
- Leaky ReLU: max(αz, z) — fixes dying ReLU
- Softmax: Used in output layer for multi-class classification
- Training: Forward propagation → compute loss → backpropagation → update weights
- Backpropagation: Chain rule of calculus to compute gradients layer by layer
- Optimizers: SGD, Momentum, Adam (adaptive learning rate), RMSprop
Convolutional Neural Networks (CNN)
- Purpose: Image processing, computer vision
- Key Layers:
- Convolutional Layer: Applies filters/kernels to detect features (edges, textures)
- Pooling Layer: Reduces spatial dimensions (Max pooling, Average pooling)
- Fully Connected Layer: Final classification
- Key Concepts: Stride, padding, feature maps, receptive field
- Famous Architectures: LeNet, AlexNet, VGG, ResNet, Inception
Recurrent Neural Networks (RNN)
- Purpose: Sequential data (text, time series, speech)
- Key Idea: Hidden state carries information from previous time steps
- Problem: Vanishing/exploding gradients for long sequences
- Solutions:
- LSTM (Long Short-Term Memory): Uses gates (forget, input, output) to control information flow
- GRU (Gated Recurrent Unit): Simplified LSTM with reset and update gates
- Applications: Language modeling, machine translation, speech recognition
Bias-Variance Tradeoff
Decomposition of Error
Total Error = Bias² + Variance + Irreducible Error
| Component | Description | Cause |
|---|---|---|
| Bias | Error from overly simplistic assumptions | Underfitting (model too simple) |
| Variance | Error from sensitivity to training data fluctuations | Overfitting (model too complex) |
| Irreducible Error | Noise inherent in data | Cannot be reduced by any model |
The Tradeoff
- Simple Model (High Bias, Low Variance): Underfits — misses patterns
- Complex Model (Low Bias, High Variance): Overfits — captures noise
- Goal: Find the sweet spot that minimizes total error
Managing the Tradeoff
- Reduce Bias: More complex model, more features, longer training
- Reduce Variance: More data, regularization, simpler model, ensemble methods
- Regularization: L1 (Lasso), L2 (Ridge), Elastic Net — penalize complexity
Overfitting and Underfitting
Underfitting
- Model too simple to capture underlying pattern
- Symptoms: High training error, high test error
- Solutions:
- Increase model complexity
- Add more features
- Train longer
- Reduce regularization
Overfitting
- Model memorizes training data including noise
- Symptoms: Low training error, high test error
- Solutions:
- Get more training data
- Use regularization (L1, L2, dropout)
- Reduce model complexity
- Use cross-validation
- Early stopping (for neural networks)
- Pruning (for decision trees)
- Data augmentation
Regularization Techniques
| Technique | Description |
|---|---|
| **L1 (Lasso) | Adds λΣ |
| **L2 (Ridge) | Adds λΣwᵢ² penalty; shrinks coefficients |
| **Elastic Net | Combines L1 and L2 |
| **Dropout | Randomly disable neurons during training (NN) |
| **Early Stopping | Stop training when validation error increases |
| **Data Augmentation | Artificially increase training data |
Cross-Validation
Purpose
Estimate model performance on unseen data and tune hyperparameters.
Types
k-Fold Cross-Validation
- Split data into k equal folds
- Train on k-1 folds, test on remaining fold
- Repeat k times (each fold used as test once)
- Average the k performance scores
- Common: k = 5 or k = 10
- Advantage: Every data point used for both training and testing
Stratified k-Fold
- Preserves class distribution in each fold
- Important for imbalanced datasets
Leave-One-Out Cross-Validation (LOOCV)
- k = n (number of samples)
- Train on n-1 samples, test on 1
- Repeat n times
- Advantage: Nearly unbiased estimate
- Disadvantage: Computationally expensive
Holdout Method
- Simple split: 70-80% training, 20-30% testing
- Disadvantage: Performance estimate depends on specific split
Evaluation Metrics
Classification Metrics
Confusion Matrix
| Predicted Positive | Predicted Negative | |
|---|---|---|
| Actual Positive | True Positive (TP) | False Negative (FN) |
| Actual Negative | False Positive (FP) | True Negative (TN) |
Key Metrics
| Metric | Formula | Interpretation |
|---|---|---|
| Accuracy | (TP+TN)/(TP+TN+FP+FN) | Overall correctness |
| Precision | TP/(TP+FP) | Of predicted positives, how many are correct |
| Recall (Sensitivity) | TP/(TP+FN) | Of actual positives, how many detected |
| Specificity | TN/(TN+FP) | Of actual negatives, how many detected |
| F1 Score | 2×(Precision×Recall)/(Precision+Recall) | Harmonic mean of precision and recall |
When to Use What?
- Accuracy: Balanced classes
- Precision: When FP is costly (e.g., spam detection — don't want legitimate email marked as spam)
- Recall: When FN is costly (e.g., cancer detection — don't want to miss actual cases)
- F1 Score: Imbalanced classes; need balance between precision and recall
ROC Curve and AUC
- ROC (Receiver Operating Characteristic): Plots True Positive Rate (Recall) vs False Positive Rate (1-Specificity) at various thresholds
- AUC (Area Under Curve): Single number summarizing ROC
- AUC = 1.0: Perfect classifier
- AUC = 0.5: Random classifier
- AUC < 0.5: Worse than random
- Advantage: Threshold-independent evaluation
Regression Metrics
| Metric | Formula | Interpretation |
|---|---|---|
| MAE | (1/n)Σ|yᵢ - ŷᵢ| | Average absolute error |
| MSE | (1/n)Σ(yᵢ - ŷᵢ)² | Average squared error (penalizes large errors) |
| RMSE | √MSE | Same units as target variable |
| R² (R-squared) | 1 - (SS_res/SS_tot) | Proportion of variance explained (0 to 1) |
| Adjusted R² | 1 - [(1-R²)(n-1)/(n-p-1)] | R² adjusted for number of predictors |
Ensemble Methods
Ensemble methods combine multiple models to improve performance.
Bagging (Bootstrap Aggregating)
- Concept: Train multiple models on different bootstrap samples; aggregate predictions
- Reduces variance without increasing bias
- Example: Random Forest
- Key: Models trained independently (can be parallel)
Boosting
- Concept: Sequentially train models; each new model focuses on errors of previous ones
- Reduces bias (and some variance)
- Key: Models trained sequentially (each depends on previous)
AdaBoost (Adaptive Boosting)
- Initialize equal weights for all training samples
- Train weak learner (e.g., decision stump)
- Increase weights of misclassified samples
- Assign weight to learner based on accuracy
- Repeat; final prediction = weighted vote
Gradient Boosting
- Each new model fits the residual errors of the previous ensemble
- Uses gradient descent to minimize loss function
- Variants: XGBoost, LightGBM, CatBoost (industry standard)
XGBoost
- Regularized gradient boosting
- Handles missing values, supports parallel processing
- Most winning algorithm in Kaggle competitions
Stacking
- Concept: Train multiple diverse models (base learners); train a meta-learner on their predictions
- Base learners: Different algorithms (e.g., SVM, RF, k-NN)
- Meta-learner: Combines base predictions (e.g., logistic regression)
Ensemble Comparison
| Method | Strategy | Reduces | Parallel? |
|---|---|---|---|
| Bagging | Independent bootstrap samples | Variance | Yes |
| Boosting | Sequential error correction | Bias | No |
| Stacking | Meta-learner on base models | Both | Partially |
Feature Engineering
Feature engineering is the process of creating/transforming features to improve model performance.
Feature Selection Methods
| Method | Description |
|---|---|
| Filter | Statistical measures (correlation, chi-square, mutual information) |
| Wrapper | Use model performance to select features (forward selection, backward elimination) |
| Embedded | Selection during model training (Lasso, tree-based importance) |
Feature Transformation
| Technique | Description |
|---|---|
| Normalization | Scale to [0,1]: x' = (x - min)/(max - min) |
| Standardization | Scale to mean=0, std=1: x' = (x - μ)/σ |
| Log Transform | Reduces skewness in right-skewed data |
| Binning | Convert continuous to categorical |
| One-Hot Encoding | Convert categorical to binary columns |
| Label Encoding | Convert categories to integers (for ordinal data) |
| PCA | Principal Component Analysis — reduce dimensions while preserving variance |
Handling Missing Data
- Remove: Drop rows/columns (if missing is random and small)
- Impute: Mean, median, mode, or predictive imputation
- Indicator: Add binary column indicating missingness
Handling Imbalanced Data
- Oversampling: Duplicate minority class samples (SMOTE — synthetic oversampling)
- Undersampling: Remove majority class samples
- Class Weights: Assign higher weight to minority class in loss function
- Threshold Adjustment: Change decision threshold
Key Formulas Summary
| Concept | Formula |
|---|---|
| A* Search | f(n) = g(n) + h(n) |
| Sigmoid | σ(z) = 1/(1 + e^(-z)) |
| MSE | (1/n) Σ(yᵢ - ŷᵢ)² |
| Entropy | H(S) = -Σ pᵢ log₂(pᵢ) |
| Gini Index | Gini = 1 - Σ pᵢ² |
| Precision | TP / (TP + FP) |
| Recall | TP / (TP + FN) |
| F1 Score | 2 × (P × R) / (P + R) |
| R² | 1 - (SS_res / SS_tot) |
| Simulated Annealing | P(accept) = e^(-ΔE/T) |
| Error Decomposition | Bias² + Variance + Irreducible |
Exam Tips
- Search Algorithms: Know BFS vs DFS vs A* — completeness, optimality, complexity
- A* Optimality: Understand admissibility and consistency conditions
- ML Types: Clearly distinguish supervised, unsupervised, and reinforcement learning
- SVM: Understand margin, support vectors, and kernel trick
- Bias-Variance: Know the tradeoff and how to manage it
- Evaluation Metrics: Know when to use precision vs recall vs F1
- Ensemble Methods: Bagging vs Boosting — key differences
- Neural Networks: Understand backpropagation, activation functions, CNN vs RNN
- Decision Trees: Know splitting criteria (entropy, Gini, information gain)
- Cross-Validation: Understand k-fold and stratified k-fold
Practice Questions
10 MCQs for Artificial Intelligence and Machine Learning with detailed explanations.
Q1. Which of the following best describes - Time Complexity: O(b^d) — overhead?
- A. a category (e.g., spam/not spam)
- B. negligible
- C. the goal
- D. mammals"
✅ Correct Answer: Option B
Explanation:
The correct answer is Option B — negligible.
This concept is covered under Artificial Intelligence and Machine Learning in the CBDT Assistant Director Systems syllabus. The answer is established through standard definitions and widely accepted principles in the field.
Why other options are incorrect:
- Option A — This option is factually incorrect or describes a concept from a different domain, making it an invalid choice for this question.
- Option C — This option is factually incorrect or describes a concept from a different domain, making it an invalid choice for this question.
- Option D — This option is factually incorrect or describes a concept from a different domain, making it an invalid choice for this question.
Q2. Regarding the following concept: '(using rules to reach conclusions), and...', which statement is correct?
- A. This is defined exclusively at the physical layer of system design
- B. This approach has been deprecated in all modern implementations
- C. This concept applies only to analog systems and not digital ones
- D. (using rules to reach conclusions), and
✅ Correct Answer: Option D
Explanation:
The correct answer is Option D — (using rules to reach conclusions), and.
This concept is covered under Artificial Intelligence and Machine Learning in the CBDT Assistant Director Systems syllabus. The answer is established through standard definitions and widely accepted principles in the field.
Why other options are incorrect:
- Option A — This option is factually incorrect or describes a concept from a different domain, making it an invalid choice for this question.
- Option B — This option is factually incorrect or describes a concept from a different domain, making it an invalid choice for this question.
- Option C — This option is factually incorrect or describes a concept from a different domain, making it an invalid choice for this question.
Q3. Regarding the following concept: '(acquiring information and rules),...', which statement is correct?
- A. This concept applies only to analog systems and not digital ones
- B. This is defined exclusively at the physical layer of system design
- C. This approach has been deprecated in all modern implementations
- D. (acquiring information and rules),
✅ Correct Answer: Option D
Explanation:
The correct answer is Option D — (acquiring information and rules),.
This concept is covered under Artificial Intelligence and Machine Learning in the CBDT Assistant Director Systems syllabus. The answer is established through standard definitions and widely accepted principles in the field.
Why other options are incorrect:
- Option A — This option is factually incorrect or describes a concept from a different domain, making it an invalid choice for this question.
- Option B — This option is factually incorrect or describes a concept from a different domain, making it an invalid choice for this question.
- Option C — This option is factually incorrect or describes a concept from a different domain, making it an invalid choice for this question.
Q4. Regarding the following concept: '| Inspired by biological evolution | Genetic algorithms |
|...', which statement is correct?
- A. This is defined exclusively at the physical layer of system design
- B. This concept applies only to analog systems and not digital ones
- C. | Inspired by biological evolution | Genetic algorithms |
| - D. This approach has been deprecated in all modern implementations
✅ Correct Answer: Option C
Explanation:
The correct answer is Option C — | Inspired by biological evolution | Genetic algorithms |
|.
This concept is covered under Artificial Intelligence and Machine Learning in the CBDT Assistant Director Systems syllabus. The answer is established through standard definitions and widely accepted principles in the field.
Why other options are incorrect:
- Option A — This option is factually incorrect or describes a concept from a different domain, making it an invalid choice for this question.
- Option B — This option is factually incorrect or describes a concept from a different domain, making it an invalid choice for this question.
- Option D — This option is factually incorrect or describes a concept from a different domain, making it an invalid choice for this question.
Q5. Which of the following best describes - Classification: Output?
- A. mammals"
- B. the goal
- C. a category (e.g., spam/not spam)
- D. negligible
✅ Correct Answer: Option C
Explanation:
The correct answer is Option C — a category (e.g., spam/not spam).
This concept is covered under Artificial Intelligence and Machine Learning in the CBDT Assistant Director Systems syllabus. The answer is established through standard definitions and widely accepted principles in the field.
Why other options are incorrect:
- Option A — This option is factually incorrect or describes a concept from a different domain, making it an invalid choice for this question.
- Option B — This option is factually incorrect or describes a concept from a different domain, making it an invalid choice for this question.
- Option D — This option is factually incorrect or describes a concept from a different domain, making it an invalid choice for this question.
Q6. Which of the following best describes Search?
- A. mammals"
- B. the goal
- C. negligible
- D. a fundamental AI technique for finding sequences of actions to achieve goals.
✅ Correct Answer: Option D
Explanation:
The correct answer is Option D — a fundamental AI technique for finding sequences of actions to achieve goals..
This concept is covered under Artificial Intelligence and Machine Learning in the CBDT Assistant Director Systems syllabus. The answer is established through standard definitions and widely accepted principles in the field.
Why other options are incorrect:
- Option A — This option is factually incorrect or describes a concept from a different domain, making it an invalid choice for this question.
- Option B — This option is factually incorrect or describes a concept from a different domain, making it an invalid choice for this question.
- Option C — This option is factually incorrect or describes a concept from a different domain, making it an invalid choice for this question.
Q7. Which of the following best describes - Regression: Output?
- A. mammals"
- B. a continuous value (e.g., house price)
- C. negligible
- D. the goal
✅ Correct Answer: Option B
Explanation:
The correct answer is Option B — a continuous value (e.g., house price).
This concept is covered under Artificial Intelligence and Machine Learning in the CBDT Assistant Director Systems syllabus. The answer is established through standard definitions and widely accepted principles in the field.
Why other options are incorrect:
- Option A — This option is factually incorrect or describes a concept from a different domain, making it an invalid choice for this question.
- Option C — This option is factually incorrect or describes a concept from a different domain, making it an invalid choice for this question.
- Option D — This option is factually incorrect or describes a concept from a different domain, making it an invalid choice for this question.
Q8. Regarding the following concept: '| Rule-based, explicit knowledge representation | Expert systems, logic programm...', which statement is correct?
- A. This approach has been deprecated in all modern implementations
- B. This concept applies only to analog systems and not digital ones
- C. | Rule-based, explicit knowledge representation | Expert systems, logic programming |
| - D. This is defined exclusively at the physical layer of system design
✅ Correct Answer: Option C
Explanation:
The correct answer is Option C — | Rule-based, explicit knowledge representation | Expert systems, logic programming |
|.
This concept is covered under Artificial Intelligence and Machine Learning in the CBDT Assistant Director Systems syllabus. The answer is established through standard definitions and widely accepted principles in the field.
Why other options are incorrect:
- Option A — This option is factually incorrect or describes a concept from a different domain, making it an invalid choice for this question.
- Option B — This option is factually incorrect or describes a concept from a different domain, making it an invalid choice for this question.
- Option D — This option is factually incorrect or describes a concept from a different domain, making it an invalid choice for this question.
Q9. Regarding the following concept: 'Human-level intelligence across all domains — theoretical
-...', which statement is correct?
- A. This is defined exclusively at the physical layer of system design
- B. This concept applies only to analog systems and not digital ones
- C. This approach has been deprecated in all modern implementations
-
D. Human-level intelligence across all domains — theoretical
✅ Correct Answer: Option D
Explanation:
The correct answer is Option D — Human-level intelligence across all domains — theoretical
-.
This concept is covered under Artificial Intelligence and Machine Learning in the CBDT Assistant Director Systems syllabus. The answer is established through standard definitions and widely accepted principles in the field.
Why other options are incorrect:
- Option A — This option is factually incorrect or describes a concept from a different domain, making it an invalid choice for this question.
- Option B — This option is factually incorrect or describes a concept from a different domain, making it an invalid choice for this question.
- Option C — This option is factually incorrect or describes a concept from a different domain, making it an invalid choice for this question.
Q10. Regarding the following concept: '.
AI Approaches
| Approach | Description | Example |
|----------|----------...', which statement is correct?
- A. .
AI Approaches
| Approach | Description | Example |
|---|---|---|
| - B. This is defined exclusively at the physical layer of system design | ||
| - C. This approach has been deprecated in all modern implementations | ||
| - D. This concept applies only to analog systems and not digital ones |
✅ Correct Answer: Option A
Explanation:
The correct answer is Option A — .
AI Approaches
| Approach | Description | Example |
|---|---|---|
| . |
This concept is covered under Artificial Intelligence and Machine Learning in the CBDT Assistant Director Systems syllabus. The answer is established through standard definitions and widely accepted principles in the field.
Why other options are incorrect:
- Option B — This option is factually incorrect or describes a concept from a different domain, making it an invalid choice for this question.
- Option C — This option is factually incorrect or describes a concept from a different domain, making it an invalid choice for this question.
- Option D — This option is factually incorrect or describes a concept from a different domain, making it an invalid choice for this question.