Monday, May 29, 2023

Beginning Machine and Deep Learning with Zeek logs

Why this series?

When teaching the SANS SEC595: Applied Data Science and Machine Learning for Cybersecurity Professionals  https://www.sans.org/cyber-security-courses/applied-data-science-machine-learning/ I am always asked,

"Will you be sharing your demo notebooks?" or "Can we get a copy of your demo notebooks?" or ... well you get the point.

My answer is always no. Not that I do not want to share, (sharing is caring :-D) , but the demo notebooks  by themselves, would not make sense or add real value. Hence, this series! 

This is my supplemental work, similar to what I would do in the demos but with a lot more details and references.

This series uses primarily Zeek's conn.log file. Notebooks 23 and 24 uses Zeek's DNS and HTTP logs respectively. 

The series includes the following:
01 - Beginning Numpy
02 - Beginning Tensorflow
03 - Beginning PyTorch
04 - Beginning Pandas
05 - Beginning Matplotlib
06 - Beginning Data Scaling
07 - Beginning Principal Component Analysis (PCA)
08 - Beginning Machine Learning Anomaly Detection - Isolation Forest and Local Outlier Factor
09 - Beginning Unsupervised Machine Learning - Clustering - K-means and DBSCAN
10 - Beginning Supervise Learning - Machine Learning - Logistic Regression, Decision Trees and Metrics
11 - Beginning Linear Regression - Machine Learning
12 - Beginning Deep Learning - Anomaly Detection with AutoEncoders, Tensorflow
13 - Beginning Deep Learning - Anomaly Detection with AutoEncoders, PyTroch
14 - Beginning Deep Learning - Linear Regression, Tensorflow
15 - Beginning Deep Learning - Linear Regression, PyTorch
16 - Beginning Deep Learning - Classification, Tensorflow
17 - Beginning Deep Learning - Classification, Pytorch
18 - Beginning Deep Learning - Classification - regression - MIMO - Functional API Tensorflow
19 - Beginning Deep Learning - Convolution Networks - Tensorflow
20 - Beginning Deep Learning - Convolution Networks - PyTorch
21 - Beginning Regularization - Early Stopping, Dropout, L2 (Ridge), L1 (Lasso)
22 - Beginning Model TFServing

But conn.log is not the only log file within Zeek. Let's build some models for DNS and HTTP logs.
I choose unsupervised, because there are no labels coming with these data.
23 - Continuing Anomaly Learning - Zeek DNS Log - Machine Learning
24 - Continuing Unsupervised Learning - Zeek HTTP Log - Machine Learning

This was a specific ask by someone in one of my class.
25 - Beginning - Reading Executables and Building a Neural Network to make predictions on suspicious vs suspicious

With 25 notebooks in this series, it is quite possible there are things I could have or should have done differently.
If you find any thing, you think fits those criteria, drop me a line.
If you find this series beneficial, I would greatly appreciate your feedback.

Some other notebooks I think you might find beneficial:


Get the notebooks by clicking the links above or from my blog: www.securitynik.com or my GitHub: github.com/SecurityNik