WK10: Natural Language Processing

Welcome to Week 10
Natural Language Processing (NLP)
Module Lecturer: Dr Raghav Kovvuri
Email: raghav.kovvuri@ieg.ac.uk

1 / 15
volgende
Slide 1: Tekstslide
Artificial Intelligence ProgrammingHigher Education (degree)

In deze les zitten 15 slides, met tekstslides.

Onderdelen in deze les

Welcome to Week 10
Natural Language Processing (NLP)
Module Lecturer: Dr Raghav Kovvuri
Email: raghav.kovvuri@ieg.ac.uk

Slide 1 - Tekstslide

Deze slide heeft geen instructies

Introduction to NLP (1)
What is Natural Language Processing
NLP is the intersection of artificial intelligence, linguistics, and computer science, enabling computers to understand, interpret, and generate human language.



Why is it important?
  • Bridges the gap between human communication and machine understanding.
  • Drives innovations in technology like voice recognition, sentiment analysis, and more.

Slide 2 - Tekstslide

Deze slide heeft geen instructies

Introduction to NLP (2)
Real-world Applications:
  • Chatbots (e.g., customer support systems)
  • Speech-to-text (e.g., transcription software)
  • Sentiment Analysis (e.g., social media monitoring)
  • Translation services (e.g., Google Translate)
Brief History of NLP:
  • Early Rule-based systems (1950s-1980s)
  • Statistical approaches (1990s-2010s)
  • Deep learning and Transformer models (2017 onwards)

Slide 3 - Tekstslide

Deze slide heeft geen instructies

Core Concepts (1)
Tokenization: Breaking text into sentences or words.
Examples:
  • Sentence Tokenization: "NLP is fun! It's challenging." → ["NLP is fun!", "It's challenging."]
  • Word Tokenization: "NLP is fun!" → ["NLP", "is", "fun", "!"]
Stemming and Lemmatization:
  • Stemming: Reducing words to their root forms (e.g., "running" → "run").
  • Lemmatization: Reducing words to their dictionary form (e.g., "running" → "run").

Slide 4 - Tekstslide

Deze slide heeft geen instructies

Part of Speech (POS) Tagging:
  • Assigning grammatical tags to words (e.g., "NLP is fun" → "NLP/NN, is/VB, fun/JJ").
Named Entity Recognition (NER):
  • Identifying proper nouns, dates, locations, etc., in text (e.g., "John went to Paris on Monday." → "John: Person, Paris: Location, Monday: Date").
Core Concepts (2)

Slide 5 - Tekstslide

Deze slide heeft geen instructies

Interactive Discussion
Prompt: How do you interact with NLP in your daily life?
  • Share examples of NLP-driven technologies you use.
Examples for reference:
  • Voice Assistants: Alexa, Siri, Google Assistant.
  • Auto-correct and Grammar Checkers: Grammarly, MS Word.
  • Translation Services: Google Translate.
  • Email Filters: Spam classification in Gmail.

Slide 6 - Tekstslide

Deze slide heeft geen instructies

NLP Pipeline
Stages of NLP Workflow:
Text Preprocessing:
  • Tokenization, removing stopwords, stemming/lemmatization.
  • Example: Cleaning social media posts for sentiment analysis.
Feature Extraction: Representing text as vectors (e.g., Bag of Words, TF-IDF, word embeddings).
Model Training: Training machine learning models like Naive Bayes, SVM, or deep learning.
Evaluation: Assessing accuracy, precision, recall, F1-score.
Deployment: Integrating NLP models into applications  (e.g., chatbots).

Slide 7 - Tekstslide

Deze slide heeft geen instructies

Common Challenges in NLP
Key Issues:
Ambiguity in Language:
  • Words with multiple meanings (e.g., "bank" as a financial institution or riverbank).
Context Understanding:
  • Difficulty in understanding long-term context (e.g., pronoun resolution: "John went to the park. He enjoyed it.").
Multilingual Processing:
  • Handling text across languages with varying grammar, syntax, and script.
Scalability: Processing large datasets efficiently.

Slide 8 - Tekstslide

Deze slide heeft geen instructies

Basic NLP Programming
Install nltk and spacy libraries:
pip install nltk spacy
Download NLTK data:
import nltk
nltk.download('punkt')
nltk.download('stopwords')
nltk.download('averaged_perceptron_tagger')
Tools Overview:
NLTK: For basic NLP tasks.
spaCy: For advanced NLP tasks like dependency parsing, NER.

Slide 9 - Tekstslide

Deze slide heeft geen instructies

Basic Text Processing
Download NLP.py from the Canvas

Slide 10 - Tekstslide

Deze slide heeft geen instructies

Modern NLP Applications
Deep Dive into Applications:
Machine Translation: Translate text across languages (e.g., Google Translate).
Chatbots: Automate customer service (e.g., Dialogflow).
Text Summarization: Generate concise summaries of articles (e.g., news aggregators).
Sentiment Analysis: Detect emotions in text (e.g., product reviews).

Slide 11 - Tekstslide

Deze slide heeft geen instructies

Emerging Trends
Key Innovations:
Transformer Models: Revolutionized NLP (e.g., Attention mechanism).
BERT and GPT: State-of-the-art models for contextual understanding.
Few-shot Learning: Train models with minimal data.
Multilingual Models: Handle diverse languages (e.g., mBERT).

Slide 12 - Tekstslide

Deze slide heeft geen instructies

Ethics in NLP
Bias in Language Models:
  • Example: Gender bias in job-related language (e.g., "nurse" → "female").
Privacy Concerns: Risks of data misuse in personal assistants.
Social Impact:
  • Positive: Accessibility (e.g., screen readers).
  • Negative: Spread of misinformation (e.g., fake news generation).
Future Implications:
  • Balance innovation with ethical considerations.

Slide 13 - Tekstslide

Deze slide heeft geen instructies

Session Overview
Understanding NLP: NLP bridges human language and computers, enabling tools like chatbots, translation, and sentiment analysis.
Core Concepts: Tokenization, Stemming, Lemmatization, NER, POS Tagging.
Hands-On Insights: Preprocessed text, removed stopwords, and analyzed word frequency & sentiment.
Advanced Applications: Modern NLP drives chatbots, machine translation, and text summarization.
Cutting-edge models: BERT, GPT.
Ethical Considerations: Address bias, privacy, and societal impact responsibly.

Slide 14 - Tekstslide

Deze slide heeft geen instructies

End....

Slide 15 - Tekstslide

https://create.kahoot.it/details/e90defd5-13e1-480d-ab28-b3b754842934