Showing posts with label MachineLearning. Show all posts
Showing posts with label MachineLearning. Show all posts

Monday, May 29, 2023

Beginning Machine and Deep Learning with Zeek logs

Why this series?

When teaching the SANS SEC595: Applied Data Science and Machine Learning for Cybersecurity Professionals  https://www.sans.org/cyber-security-courses/applied-data-science-machine-learning/ I am always asked,

"Will you be sharing your demo notebooks?" or "Can we get a copy of your demo notebooks?" or ... well you get the point.

My answer is always no. Not that I do not want to share, (sharing is caring :-D) , but the demo notebooks  by themselves, would not make sense or add real value. Hence, this series! 

This is my supplemental work, similar to what I would do in the demos but with a lot more details and references.

This series uses primarily Zeek's conn.log file. Notebooks 23 and 24 uses Zeek's DNS and HTTP logs respectively. 

The series includes the following:
01 - Beginning Numpy
02 - Beginning Tensorflow
03 - Beginning PyTorch
04 - Beginning Pandas
05 - Beginning Matplotlib
06 - Beginning Data Scaling
07 - Beginning Principal Component Analysis (PCA)
08 - Beginning Machine Learning Anomaly Detection - Isolation Forest and Local Outlier Factor
09 - Beginning Unsupervised Machine Learning - Clustering - K-means and DBSCAN
10 - Beginning Supervise Learning - Machine Learning - Logistic Regression, Decision Trees and Metrics
11 - Beginning Linear Regression - Machine Learning
12 - Beginning Deep Learning - Anomaly Detection with AutoEncoders, Tensorflow
13 - Beginning Deep Learning - Anomaly Detection with AutoEncoders, PyTroch
14 - Beginning Deep Learning - Linear Regression, Tensorflow
15 - Beginning Deep Learning - Linear Regression, PyTorch
16 - Beginning Deep Learning - Classification, Tensorflow
17 - Beginning Deep Learning - Classification, Pytorch
18 - Beginning Deep Learning - Classification - regression - MIMO - Functional API Tensorflow
19 - Beginning Deep Learning - Convolution Networks - Tensorflow
20 - Beginning Deep Learning - Convolution Networks - PyTorch
21 - Beginning Regularization - Early Stopping, Dropout, L2 (Ridge), L1 (Lasso)
22 - Beginning Model TFServing

But conn.log is not the only log file within Zeek. Let's build some models for DNS and HTTP logs.
I choose unsupervised, because there are no labels coming with these data.
23 - Continuing Anomaly Learning - Zeek DNS Log - Machine Learning
24 - Continuing Unsupervised Learning - Zeek HTTP Log - Machine Learning

This was a specific ask by someone in one of my class.
25 - Beginning - Reading Executables and Building a Neural Network to make predictions on suspicious vs suspicious

With 25 notebooks in this series, it is quite possible there are things I could have or should have done differently.
If you find any thing, you think fits those criteria, drop me a line.
If you find this series beneficial, I would greatly appreciate your feedback.

Some other notebooks I think you might find beneficial:



Get the notebooks by clicking the links above or from my blog: www.securitynik.com or my GitHub: github.com/SecurityNik