Thursday, December 12, 2019

Wireless Security Analysis with Pandas

For this part of my Pandas learning, I drove around the neighbourhood looking for wireless information which I can use as part of this analysis.

To prepare to capture the traffic, I did as follows
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
root@securitynik:~/PA-Pandas# airodump-ng start wlan0
root@securitynik:~/PA-Pandas# ifconfig 

...

wlan0mon: flags=867<UP,BROADCAST,NOTRAILERS,RUNNING,PROMISC,ALLMULTI>  mtu 1500
        unspec 00-C0-CA-75-0B-E5-30-3A-00-00-00-00-00-00-00-00  txqueuelen 1000  (UNSPEC)
        RX packets 92483  bytes 19493415 (18.5 MiB)
        RX errors 0  dropped 968  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

Then start the actual capturing
1
root@securitynik:~/PA-Pandas# airodump-ng --write securitynik-Wi-Fi-Test wlan0mon

Once the capturing started, among the files it created were:
1
2
3
4
5
6
7
root@securitynik:~/PA-Pandas# ls -al *.csv
-rw-r--r-- 1 root root  436908 Nov 17 17:41 securitynik-Wi-Fi-Test-01.csv
-rw-r--r-- 1 root root  207204 Nov 17 17:41 securitynik-Wi-Fi-Test-01.kismet.csv
-rw-r--r-- 1 root root 5387235 Nov 17 17:41 securitynik-Wi-Fi-Test-01.log.csv

We will use the "securitynik-Wi-Fi-Test-01.csv" file. This file currently has 3884 lines as shown below.
1
2
3
root@securitynik:~/PA-Pandas# cat securitynik-Wi-Fi-Test-01.csv | wc --lines

3884

Looking at some sample data from the "securitynik-Wi-Fi-Test-01.csv" file
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
root@securitynik:~/PA-Pandas# cat securitynik-Wi-Fi-Test-01.csv | more

BSSID, First time seen, Last time seen, channel, Speed, Privacy, Cipher, Authentication, Power, # beacons, # IV, LA

N IP, ID-length, ESSID, Key

00:15:FF:5D:60:C4, 2019-11-17 16:25:17, 2019-11-17 17:27:03, -1,  -1, , ,   ,  -1,        0,        0,   0.  0.  0.

  0,   0, ,

88:DC:96:25:E8:94, 2019-11-17 16:25:11, 2019-11-17 17:27:04,  3, 270, WPA2, CCMP, PSK, -59,        8,        0,   0

.  0.  0.  0,   6, @ASPMC,

70:B3:17:1C:BA:80, 2019-11-17 16:25:11, 2019-11-17 17:27:01,  1, 195, WPA2, CCMP, PSK, -60,       17,        0,   0

.  0.  0.  0,  13, SnaponIncMISS,

00:02:6F:FD:FD:1C, 2019-11-17 16:25:15, 2019-11-17 17:27:02, 11, 130, WPA2, CCMP, PSK, -61,        9,        1,   0

.  0.  0.  0,   5, FGSEG,

82:D2:94:B7:86:83, 2019-11-17 16:25:19, 2019-11-17 17:27:01,  1, 360, WPA2, CCMP, PSK, -61,       54,        0,   0

.  0.  0.  0,   0, ,

...

While trying to read the file, in Pandas an error occurred. This it seems is because there are two sections. Therefore, we wil split it into two.

From below, we see these two sections headers

1
2
3
root@securitynik:~/PA-Pandas# cat securitynik-Wi-Fi-Test-01.csv | grep -i BSSID
BSSID, First time seen, Last time seen, channel, Speed, Privacy, Cipher, Authentication, Power, # beacons, # IV, LAN IP, ID-length, ESSID, Key
Station MAC, First time seen, Last time seen, Power, # packets, BSSID, Probed ESSIDs

Now that we have a snapshot into the data, let's now switch to our code:

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
#!/usr/bin/env python3

'''
This code is me learning about Pandas Data Science. 
This is based on the Pandas for Data Science training from Pentester Academy
I decided to do things from my own perspective to some extent
I drove around the neighbourhood and captured the Wi-Fi information, so that
I can get my own perspective from my own data

Feel free to use this code as you see fit

Author: Nik Alleyne
Author Blog: www.securitynik.com


'''

from io import StringIO
import netaddr 
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns
import subprocess as sp
import sys



def usage():
    print('[*] Usage Information: ')
    print('[*] ./pandasWi-Fi.py <filename>. e.g. ./pandas-Wi-Fi.py my.csv')
    print('[*] Author: Nik Alleyne')
    print('[*] Author Blog: www.securitynik.com')
    sys.exit(-1)



def wifi_data_analysis(csv_file):
    print('[*] Opening the csv file ... ')
    wifi_data = open(csv_file, 'r').read()

    '''
    Need to split the data into two sections before creating the Pandas dataframe
    As can be seen below, there are two headers sections. One of these is for the Access Point
    and the other is for the client

root@securitynik:~/PA-Pandas# cat securitynik-Wi-Fi-Test-01.csv | grep -i BSSID
BSSID, First time seen, Last time seen, channel, Speed, Privacy, Cipher, Authentication, Power, # beacons, # IV, LAN IP, ID-length, ESSID, Key
Station MAC, First time seen, Last time seen, Power, # packets, BSSID, Probed ESSIDs

    '''
    print('[+] Splitting the data into a AP and client section')
    client_header = 'Station MAC, First time seen, Last time seen, Power, # packets, BSSID, Probed ESSIDs'
    ap_client_split = wifi_data.index(client_header)

    # Get AP section
    wifi_ap_data = StringIO(wifi_data[:ap_client_split])

    # Get Client section
    wifi_client_data = StringIO(wifi_data[ap_client_split:])

    '''
    This was a pain in the ass. Kept getting errors when attempting to create the DataFrame.
    Fortunately, this link helped to solve the problem
    https://stackoverflow.com/questions/18039057/python-pandas-error-tokenizing-data
    '''

    print('[+] Creating Access Point DataFrame ...')
    access_point_df = pd.read_csv(wifi_ap_data, sep=',', header=0, skipinitialspace=True, error_bad_lines=False, warn_bad_lines=False, parse_dates=['First time seen', 'Last time seen'])

    print('\n[*] Access Point column information before conversion {}'.format(access_point_df.columns))

    # My understanding is that we would be better off renaming those columns to something without space
    access_point_df.rename(columns={ 'BSSID' : 'BSSID', 'First time seen' : 'FirstTimeSeen', 'Last time seen' : 'LastTimeSeen', 'channel' : 'channel', 'Speed' : 'Speed', 'Privacy' : 'Privacy', 'Cipher' : 'Cipher', 'Authentication' : 'Authentication', 'Power' : 'Power', '# beacons' : 'BeaconsCount', '# IV' : 'IV', 'LAN IP' : 'LAN-IP', 'ID-length' : 'ID-Length', 'ESSID' : 'ESSID', 'Key' : 'Key' }, inplace=True)




    print('\n[*] Sample Access Point data \n {}'.format(access_point_df.head()))
    print('\n[*] Getting over all count of Access Point Data \n {}'.format(access_point_df.count()))
    print('\n[*] Overall you have {} rows and columns {} in the AP dataframe \n'.format(access_point_df.shape[0], access_point_df.shape[1]))
    print('\n[*] Data types in the AP dataframe \n {}'.format(access_point_df.dtypes))

    # Looking for the unique Access Point SSID
    print('\n[*] Here are the ESSIDs found ... \n {}' .format(list(set(access_point_df.ESSID))))
    
    # Get a count of the total unique SSIDs returned
    print('\n[*] Total unique SSIDs returned was:{} \n' .format(len(list(set(access_point_df.ESSID)))))

    # Looking for situatio where there is NAN
    print('[*] Do we have any "nan" values \n {}'.format(access_point_df.ESSID.hasnans))

    # Now that we see we have ESSID with NAN values, let's replace them
    access_point_df.ESSID.fillna('HIDDEN ESSID', inplace=True)

    # Let's now check again for those nan values
    print('[*] Do we have any "nan" values \n {}'.format(access_point_df.ESSID.hasnans))
    print('[*] First 10 records after the replacement of nans \n {}' .format(access_point_df.head()))
    # Good stuff, we replaced all the nan values

    # Looking at the frequency with which the SSIDs have been seen
    print('[*] Frequency of the SSID seen \n {}'.format(access_point_df.ESSID.value_counts()))

    # Plot the graph of the usage
    access_point_df.ESSID.value_counts().plot(kind='pie', figsize=(10,5))
    plt.show()

    # Looking at the channels in use
    print('\n[*] Frequency of the channels being seen \n {}'.format(access_point_df.channel.value_counts()))
    access_point_df.channel.value_counts().plot(kind='bar', figsize=(10,5))
    plt.show()
    
    # Time now for some grouping
    # first group by ESSID and the channels they are seen on
    print('\n[*] Grouping by SSID and channel ... \n {}'.format(access_point_df.groupby(['ESSID', 'channel'])['channel'].count()))

    # Looking at unstack
    print('\n[*] Looking at unstacking ... \n {}'.format(access_point_df.groupby(['ESSID', 'channel'])['channel'].count().unstack()))
    
    # The result above produced a number of channels with 'nan' values. Time to fill that with 0s
    print('\n[*] Filled the NANs with 0 ... \n {}'.format(access_point_df.groupby(['ESSID', 'channel'])['channel'].count().unstack().fillna(0)))

    # Create graph of the grouping information
    access_point_df.groupby(['ESSID', 'channel'])['channel'].count().unstack().fillna(0).plot(kind='bar', stacked=True, figsize=(10,5)).legend(bbox_to_anchor=(1.1,1))
    plt.show()


    # Extract the OUI from the MAC address - basically the firs 3 bytes
    oui_manufacturer = access_point_df.BSSID.str.extract('(..:..:..)', expand=False)
    print('\n[*] Here is your top 10 manufacturers OUI \n {} '.format(oui_manufacturer.head(10)))

    # Print the counts of each OUI
    print('\n[*] Here is your manufacturers OUI with the count \n {} '.format(oui_manufacturer.value_counts()))

    
    '''
        Client information and analysis start from here
    '''
    print('*'*100)
    print('[+] Creating Client DataFrame ...')
    client_df = pd.read_csv(wifi_client_data, sep=',', header=0, skipinitialspace=True, error_bad_lines=False, warn_bad_lines=False, parse_dates=['First time seen', 'Last time seen'])
    print('\n[*] Access Point column information before conversion {}'.format(client_df.columns))

    # Once again, addressing the space issue between column names
    client_df.rename(columns= {'Station MAC' : 'StationMAC', 'First time seen' : 'FirstTimeSeen', 'Last time seen' : 'LastTimeSeen', 'Power' : 'Power', '# packets' : 'PacketCount', 'BSSID' : 'BSSID', 'Probed ESSIDs' : 'ProbedESSIDs'}, inplace=True)

    print('\n[*] Sample client data \n {}'.format(client_df.head()))
    print('\n[*] Getting over all count of client Data \n {}'.format(client_df.count()))
    print('\n[*] Overall you have {} rows and columns {} in the AP dataframe \n'.format(client_df.shape[0], client_df.shape[1]))
    print('\n[*] Data types in the client dataframe \n {}'.format(client_df.dtypes))

    # Taking a look at the client SSIDs
    print('\n[*] Here are your client SSIDs \n {}'.format(client_df.BSSID.head()))

    # Looking at the probd ESSIDS
    print('\n[*] Here are the ESSIDs the clients are probing for ... \n {}'.format(client_df.ProbedESSIDs))



def main():
    sp.call(['clear'])
    sns.set_color_codes('dark')
    # Checking the command line to ensure 1 argument is passed to the command
    if (len(sys.argv) != 2 ):
        usage()
    else:
        print('[*] Reading command line arguments ... ')
        if (sys.argv[1].endswith('.csv')):
            print('[*] Found a CSV file ... ')
        else:
            print('[!] File is not .csv file. Exiting!!')
            sys.exit(-1)
            
    # Reading the CSV file
    wifi_data_analysis(sys.argv[1])



if __name__ == '__main__':
    main()

References:
seaborn.set_color_codes
Code Academy - Seaborn Styling, Part 2: Color
Pandas Read CSV
Pandas DataFrame Plot

Posts in this series:
Beginning Numpy
Beginning Pandas
Pandas String Operations, etc.

No comments:

Post a Comment