Monday, May 29, 2023

Beginning Machine and Deep Learning with Zeek logs

Why this series?

When teaching the SANS SEC595: Applied Data Science and Machine Learning for Cybersecurity Professionals  https://www.sans.org/cyber-security-courses/applied-data-science-machine-learning/ I am always asked,

"Will you be sharing your demo notebooks?" or "Can we get a copy of your demo notebooks?" or ... well you get the point.

My answer is always no. Not that I do not want to share, (sharing is caring :-D) , but the demo notebooks  by themselves, would not make sense or add real value. Hence, this series! 

This is my supplemental work, similar to what I would do in the demos but with a lot more details and references.

This series uses primarily Zeek's conn.log file. Notebooks 23 and 24 uses Zeek's DNS and HTTP logs respectively. 

The series includes the following:
01 - Beginning Numpy
02 - Beginning Tensorflow
03 - Beginning PyTorch
04 - Beginning Pandas
05 - Beginning Matplotlib
06 - Beginning Data Scaling
07 - Beginning Principal Component Analysis (PCA)
08 - Beginning Machine Learning Anomaly Detection - Isolation Forest and Local Outlier Factor
09 - Beginning Unsupervised Machine Learning - Clustering - K-means and DBSCAN
10 - Beginning Supervise Learning - Machine Learning - Logistic Regression, Decision Trees and Metrics
11 - Beginning Linear Regression - Machine Learning
12 - Beginning Deep Learning - Anomaly Detection with AutoEncoders, Tensorflow
13 - Beginning Deep Learning - Anomaly Detection with AutoEncoders, PyTroch
14 - Beginning Deep Learning - Linear Regression, Tensorflow
15 - Beginning Deep Learning - Linear Regression, PyTorch
16 - Beginning Deep Learning - Classification, Tensorflow
17 - Beginning Deep Learning - Classification, Pytorch
18 - Beginning Deep Learning - Classification - regression - MIMO - Functional API Tensorflow
19 - Beginning Deep Learning - Convolution Networks - Tensorflow
20 - Beginning Deep Learning - Convolution Networks - PyTorch
21 - Beginning Regularization - Early Stopping, Dropout, L2 (Ridge), L1 (Lasso)
22 - Beginning Model TFServing

But conn.log is not the only log file within Zeek. Let's build some models for DNS and HTTP logs.
I choose unsupervised, because there are no labels coming with these data.
23 - Continuing Anomaly Learning - Zeek DNS Log - Machine Learning
24 - Continuing Unsupervised Learning - Zeek HTTP Log - Machine Learning

This was a specific ask by someone in one of my class.
25 - Beginning - Reading Executables and Building a Neural Network to make predictions on suspicious vs suspicious

With 25 notebooks in this series, it is quite possible there are things I could have or should have done differently.
If you find any thing, you think fits those criteria, drop me a line.
If you find this series beneficial, I would greatly appreciate your feedback.

Some other notebooks I think you might find beneficial:


Get the notebooks by clicking the links above or from my blog: www.securitynik.com or my GitHub: github.com/SecurityNik

6 comments:

  1. I really appreciated your blog. Taught me a great deal. Would you let me know where I could obtain Zeek dns.log. I am interested to build a ML model that detects certain network attack patterns. Thanks!

    ReplyDelete
    Replies
    1. Sorry about the delay. You can get the log from my github
      https://github.com/SecurityNik/Data-Science-and-ML/blob/main/Beginning%20Machine%20and%20Deep%20Learning%20with%20Zeek%20logs/zeek_dns.log

      Delete
  2. Great content! I am trying to follow along, is it possible to get the conn.log file?

    ReplyDelete
    Replies
    1. Sorry about the delay, is there a particular notebook you are looking at and requires that conn.log file? Also, if you wish, maybe try installing Zeek and try other log files also: https://www.securitynik.com/2020/06/installing-zeek-314-on-ubuntu-2004.html

      Delete
    2. No worries, thank you so much for the reply, I have been trying to learn ML for Computer Security in my free time since we dont cover it at all in uni and this guide have been great! I've been trying to follow your entire guide "Beginning Machine and Deep Learning with Zeek logs" on github where you use "conn.log" to prepare a dataset for the entire guide. And I installed Zeek, but have had some trouble with Broker, and producing enough data, and very little anomalies for my use alone. And if you cant share it, I would greatly appriciate any other advise for creating my own.

      Delete
    3. I know why I did not add those files. They are larger than 25MB.

      Delete