Natural Language Processing for OSINT & Threat Analysis (W54) - eForensics

No products in the cart.
LOGIN / SIGN UP

LOGIN

Forgot Password

Remember Me

Sign Up

DURATION: 7 hours

CPE POINTS: On completion you get a certificate granting you 7 CPE points.

COMPLETE, SELF-PACED, PRERECORDED

In this course, we apply Natural Language Processing to cyber threat analysis and OSINT, to assess and analyze data gained from open sources and social networks. The course is a project-based course so that after learning a concept we immediately bring it into action in order to analyze a dataset. We will work on election subjects and use gathered datasets to evaluate hate speech tweets, popularity, using plots to show most common words, and monthly popularity. We also learn how to find the best documents for a query and document ranking and how to cluster documents based on their similarity.

This course is for everyone who wants to become familiar with Natural Language Processing and use it for OSINT and cyber threat analysis.

Forensics professionals
OSINT enthusiasts
Security analysts
Penetration testers

Machine learning is no longer an alien field. It has penetrated through all aspects of technology. NLP is one of the greatest machine learning fields, which aims to process and extract information from text data. Cyber threat analysis using these new tools is more critical than ever before, due to breaking out of social networks.

This is a short but valuable course that aims to help students empower their abilities in OSINT and threat hunting and to take the first step toward machine learning - especially NLP. This is a new and unique approach to OSINT and web crawling.

Course benefits:

Tools
Skills

What tools will you use?

Bash
Python
NLTK
spaCy
Matplotlib
GloVe
sklearn
doc2term
lxml

What skills will you gain?

Text analysis and training models for hate speech and sentiment analysis.
Parsing raw data and using NLP pipelines.
Crawling information from open sources.

Course general information:

Course format:

Self-paced
Pre-recorded
Accessible even after you finish the course
No preset deadlines
Materials are video, labs, and text

Equipment

In this course, we work on Kali Linux distribution. It could be installed on a virtual machine or could be live. Though it is not necessary to use Kali Linux. The concepts can be implemented on other systems.

Experience

Before beginning this course, make sure you have a good knowledge of Python and requests.
No prior experience with NLP is needed!

YOUR INSTRUCTOR: Saeed Dehqan

Saeed is currently a project leader working with OWASP and an instructor in Hakin9.org e-learning. At OWASP, he is a security researcher and project leader.
He has extensive experience in security areas such as network security, secure coding, threat hunting, applied deep learning for threat analysis, DevOps, and more. He has 5 years of experience in research and works in the software engineering and cyber-security fields with some companies. He is also a mentor in Google Summer of Code 2021. He is passionate about Natural Language Processing and uses it for Cybersecurity purposes.

COURSE SYLLABUS

Module 0

Before the course

Introduction: Wordcloud and histogram for term-frequency.

Module 1

An introduction to NLP

In this section, we first learn beginner NLP concepts and then how to implement preprocessing NLP pipelines to have a clean dataset and corpora. The topics may be alien for some of the students, but they are very simple. The course is based on an assumption that participants have no prior experience with Machine Learning and NLP.

File formats
Regular expressions
Punctuation
Tokenization
Standardization
Stopwords
Lemmatization
Stemming
Ngram
Wordcloud
Term-frequency
Histogram for term-frequency
TF-IDF

Exercises

Writing a pipeline for preprocessing.

Module 2

Exploring cyberspace

In this module, we will learn how to scrape Twitter, Reddit, Google, and PubMed and how to use lxml and XPath.

Extracting snippets from Google
Extracting abstract of articles from PubMed
Extracting tweets from Twitter without API
Extracting trended hashtags from Twitter
Extracting posts from Reddit based on topicality

Exercises

Extract hashtags, phone numbers, emails from content with regular expressions.

Module 3

Gaining knowledge from data

In this module, we will try to process data in order to gain information from the data. We will classify, label, merge, and make semantic clusters from the gathered tweets. Then, we use a sentiment analysis algorithm and hate speech to know the popularity of subjects for tweets.

Classification and clustering and use them in action
Similarity formulas: cosine, euclidean
Document clustering
Using GloVe for clustering
Merge algorithms to sort results accordingly
Sentiment analysis algorithms and use them in action
Train a hate speech model
Evaluating tweets with hate speech model for election
Save the trained model

Exercises

Implement a simple document retriever.

Contact:

If you have any questions, please contact us at [email protected].

Course Reviews

N.A

ratings

5 stars0
4 stars0
3 stars0
2 stars0
1 stars0

No Reviews found for this course.

https://eforensicsmag.com/wp-content/uploads/2023/06/logo_eForensics_white.svg

© HAKIN9 MEDIA SP. Z O.O. SP. K. 2023