In this course, we apply Natural Language Processing to cyber threat analysis and OSINT, to assess and analyze data gained from open sources and social networks. The course is a project-based course so that after learning a concept we immediately bring it into action in order to analyze a dataset. We will work on election subjects and use gathered datasets to evaluate hate speech tweets, popularity, using plots to show most common words, and monthly popularity.

This course is for everyone who wants to become familiar with Natural Language Processing and use it for OSINT and cyber threat analysis. 

  • Forensics professionals
  • OSINT enthusiasts
  • Security analysts
  • Penetration testers

Machine learning is no longer an alien field. It has penetrated through all aspects of technology. NLP is one of the greatest machine learning fields, which aims to process and extract information from text data. Cyber threat analysis using these new tools is more critical than ever before, due to breaking out of social networks.

This is a short but valuable course that aims to help students empower their abilities in OSINT and threat hunting, and to take the first step toward machine learning - especially NLP. This is a new and unique approach to OSINT and web crawling. 

Course benefits:

What tools will you use?

  • Bash
  • Python
  • NLTK
  • spaCy
  • Matplotlib
  • GloVe
  • Vis in js
  • Pyplot

What skills will you gain?

  • Text analysis and training models for hate speech.
  • Parsing raw data and using NLP pipelines.
  • Crawling information from open sources.

Course general information:

DURATION: 7 hours

CPE POINTS: On completion you get a certificate granting you 7 CPE points.

LAUNCH OCTOBER 20th, SELF-PACED, PUBLISHED ON A WEEKLY SCHEDULE

Course format:

  • Self-paced
  • Pre-recorded
  • Accessible even after you finish the course
  • No preset deadlines
  • Materials are video, labs, and text
  • All videos captioned

Equipment

In this course, we work on Kali Linux distribution. It could be installed on a virtual machine or could be live.

Experience

  • Before beginning this course, make sure you have a good knowledge of Python.
  • No prior experience with NLP is needed! 

YOUR INSTRUCTOR: Saeed Dehqan

Saeed is currently a project leader working with OWASP and an instructor in Hakin9.org e-learning. At OWASP, he is a security researcher and project leader.
He has extensive experience in security areas such as network security, secure coding, threat hunting, applied deep learning for threat analysis, DevOps, and more. He has 5 years of experience in research and works in the software engineering and cyber-security fields with some companies. He is also a mentor in Google Summer of Code 2021. He is passionate about Natural Language Processing and uses it for Cybersecurity purposes.

 


COURSE SYLLABUS


Module 0

Before the course

Introduction: Wordcloud and histogram for term-frequency.


Module 1

An introduction to NLP

In this section, we first learn beginner NLP concepts and then how to implement preprocessing NLP pipelines to have a clean dataset and corpora. The topics may be alien for some of the students, but they are very simple. The course is based on an assumption that participants have no prior experience with Machine Learning and NLP.

  • File formats
  • Regular expressions
  • Punctuation
  • Tokenization
  • Standardization
  • Stopwords
  • Lemmatization
  • Stemming
  • Ngram
  • Wordcloud
  • Term-frequency
  • Histogram for term-frequency
  • TF-IDF

Exercises

Writing a pipeline for preprocessing.


Module 2

Exploring cyberspace

In this module, we will learn how to scrape Twitter, Reddit, Google, and PubMed.

  • Extracting snippets from Google
  • Extracting abstract of articles from PubMed
  • Extracting tweets from Twitter without API
  • Extracting trended hashtags from Twitter
  • Extracting posts from Reddit based on topicality

Exercises

Extract hashtags, phone numbers, emails from content with regular expressions.


Module 3

Gaining knowledge from data

In this module, we will try to process data in order to gain information from the data. We will classify, label, merge, and make semantic clusters from the gathered tweets. Then, we use a sentiment analysis algorithm and hate speech to know the popularity of subjects for tweets.

  • Classification and clustering and use them in action
  • Similarity formulas: cosine, euclidean
  • Document clustering
  • Using GloVe for clustering
  • Merge algorithms to sort results accordingly
  • Sentiment analysis algorithms and use them in action
  • Train a hate speech model
  • Evaluating tweets with hate speech model for election
  • Save the trained model

Exercises

Implement a simple document retriever.


QUESTIONS?
If you have any questions, please contact our eLearning Manager Marta at [email protected].

Course Reviews

N.A

ratings
  • 5 stars0
  • 4 stars0
  • 3 stars0
  • 2 stars0
  • 1 stars0

No Reviews found for this course.

© HAKIN9 MEDIA SP. Z O.O. SP. K. 2013

Privacy Preference Center

Necessary

Cookies that are necessary for the site to function properly. This includes, storing the user's cookie consent state for the current domain, managing users carts to using the content network, Cloudflare, to identify trusted web traffic. See full Cookies declaration

gdpr, PYPF, woocommerce_cart_hash, woocommerce_items_in_cart, _wp_wocommerce_session, __cfduid [x2]

Performance

These are used to track user interaction and detect potential problems. These help us improve our services by providing analytical data on how users use this site.

_global_lucky_opt_out, _lo_np_, _lo_cid, _lo_uid, _lo_rid, _lo_v, __lotr
_ga, _gid, _gat, __utma, __utmt, __utmb, __utmc, __utmz
vuid

Marketing


tr, fr
ads/ga-audiences