WK3: Artificial Neural Networks II: Advanced Concepts

Welcome to Week 3
Artificial Neural Networks II: Advanced Concepts
Module Lecturer: Dr Raghav Kovvuri
Email: raghav.kovvuri@ieg.ac.uk

1 / 16
next
Slide 1: Slide
Artificial Intelligence ProgrammingHigher Education (degree)

This lesson contains 16 slides, with interactive quiz and text slides.

Items in this lesson

Welcome to Week 3
Artificial Neural Networks II: Advanced Concepts
Module Lecturer: Dr Raghav Kovvuri
Email: raghav.kovvuri@ieg.ac.uk

Slide 1 - Slide

Recap and Objectives
Quick Review: Network Architecture                               (from previous week)
  • Input, Hidden, and Output Layers
  • Fully Connected Networks
This Week's Objectives:
  • Explore Advanced Network Architectures
  • Understand Deep Learning Concepts
  • Learn Advanced Training Techniques
  • Implement Advanced ANN Concepts in Python

Slide 2 - Slide

Gradient Descent Algorithm(1)
Definition: An optimization algorithm used to minimize the loss function by iteratively moving towards the minimum.
Key Steps:
  • Initialize parameters (weights and biases)
  • Calculate the gradient of the loss function
  • Update parameters in the opposite direction of the gradient
  • Repeat steps 2-3 until convergence

Slide 3 - Slide

Types:
  • Batch Gradient Descent: Uses entire dataset
  • Stochastic Gradient Descent (SGD): Uses single sample
  • Mini-batch Gradient Descent: Uses a small batch of samples
Gradient Descent Algorithm(2)
Formula:
θ = θ - α * ∇J(θ)
Where:
θ: Parameters, 
α: Learning rate
∇J(θ): Gradient of the loss function

Slide 4 - Slide

Loss Functions
1. Mean Squared Error (MSE)
  • Used for regression problems
  • Calculates the average squared difference between predictions and actual values
  • Formula: MSE = (1/n) * Σ(y_true - y_pred)^2
  • Penalizes larger errors more heavily due to squaring



Purpose: Measure the difference between predicted and actual outputs.
3. Categorical Cross-Entropy
  • Used for multi-class classification
  • Measures the dissimilarity between the predicted probability distribution and the actual distribution
  • Formula: CCE = -Σ(y_true * log(y_pred))
2. Binary Cross-Entropy
  • Used for binary classification
  • Measures the performance of a model whose output is a probability value between 0 and 1
  • Formula: BCE = -Σ(y_true * log(y_pred) + (1 - y_true) * log(1 - y_pred))
  • Heavily penalizes confident and wrong predictions

Slide 5 - Slide

Choosing a Loss Function: Regression, Binary Classification, Multi-class Classification

Slide 6 - Open question

Discussion
Self-Learning: Feedforward Networks, MLPs, and Backpropagation
You are expected to independently explore the key concepts of Feedforward Networks, Multi-Layer Perceptrons (MLPs), and the Backpropagation algorithm. These foundational topics in neural networks are critical for understanding the mechanics behind deep learning models.

Learning Objectives:
  • Understand the architecture and functionality of Feedforward Networks.
  • Grasp the structure and layers of Multi-Layer Perceptrons (MLPs).
  • Comprehend how the Backpropagation algorithm adjusts weights during training to minimize error.
timer
20:00
Submit to 

Slide 7 - Slide

Advanced Network Architectures
1. Convolutional Neural Networks (CNNs)
  • Specialized for processing grid-like data (e.g., images)
  • Key components: Convolutional layers, Pooling layers
2. Recurrent Neural Networks (RNNs)
  • Designed for sequential data
  • Ability to maintain internal state (memory)
3. Long Short-Term Memory (LSTM) Networks
  • Special type of RNN
  • Addresses vanishing gradient problem in standard RNNs

Slide 8 - Slide

LSTM (Long Short-Term Memory)
LSTM: A type of RNN designed to handle long-term dependencies.

Key Components:
  • Forget Gate
  • Input Gate
  • Output Gate
  • Cell State
Exercise: Download LSTM_StockPrediction.py (VSCODE) and .ipynb for Jupyter Notebook from Canvas

Slide 9 - Slide

Deep Learning Concepts
1. Vanishing/Exploding Gradients
Problem: Gradients become very small or very large in deep networks
Solutions:
  • Proper weight initialization
  • Activation functions (e.g., ReLU)
  • Gradient clipping
 2. Transfer Learning
  • Using pre-trained models for new tasks
  • Fine-tuning vs. feature extraction
3. Batch Normalization
  • Normalizing inputs to each layer
  • Speeds up training and adds regularization effect

Slide 10 - Slide

Advanced Training Techniques
1. Learning Rate Scheduling
  • Adjusting learning rate during training
  • Techniques: Step decay, Exponential decay, Cyclic learning rates
Advanced Optimizers
  • Adam: Adaptive Moment Estimation
  • RMSprop: Root Mean Square Propagation
  • Adagrad: Adaptive Gradient Algorithm
3. Regularization Techniques
  • L1 and L2 regularization
  • Dropout
  • Data augmentation

Slide 11 - Slide

Hyperparameter Tuning
1. Grid Search
  • Exhaustive search through a specified parameter space
2. Random Search
  • Randomly sampling from the parameter space
3. Bayesian Optimization
  • Using probabilistic model to guide the search

Slide 12 - Slide

Advanced ANN Architectures
1. Residual Networks (ResNet)
  • Skip connections to allow training of very deep networks
2. Generative Adversarial Networks (GANs)
  •  Two networks (Generator and Discriminator) trained simultaneously
3. Autoencoders
  • Unsupervised learning for dimensionality reduction and feature learning

Slide 13 - Slide

Practical Considerations 
  • Data Preprocessing and Augmentation
  • Model Interpretability (e.g., SHAP values, Grad-CAM)
  • Handling Imbalanced Datasets
  • Deployment Considerations (Model compression, Quantization)

Slide 14 - Slide

Conclusion and Next Steps
Recap of Advanced ANN Concepts:
  • Advanced Architectures (CNNs, RNNs, LSTMs)
  • Deep Learning Challenges and Solutions
  • Advanced Training Techniques
  • Hyperparameter Tuning
  • Cutting-edge Architectures (ResNet, GANs, Autoencoders)
Preview of Next Week: Introduction to Genetic Algorithms

Slide 15 - Slide

Slide 16 - Slide