Thursday, December 12, 2019

Beginning Pandas

In this second part of my learning ...

* Pandas work well with text based data and is extremely powerful.
* Pandas allows us to look at data from both a macro and micro perspective.
* One of its main features it is the DataFrame. DataFrames can be considered similar to Excel spreadsheets.
* It also has the capability for Series, which are array like.
* Pandas index does not have to start at 0
* Pandas index does not have to be ordered
* Pandas index does not have to be a number. Can be a list of strings
* Pandas index are very flexible
* When using series, series provide the index and its values
* If a index is not specifically defined, Pandas will create an incrementing index
* Can even create Pandas series from Python dictionaries
* Pandas has the capability to do both position-based and label-based lookup. The two should not be confused
* To use label based lookup, use ".loc"
* For position based lookup use the ".iloc"
* Can pass a list of index values to both the ".loc" and ".loc"
* Alternatively, we can use the ".ix". This tries to first lookup based on label and if that does not work, then by position.
* Pandas is built on Numpy


 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
#!/usr/bin/env python3

import numpy as np
import pandas as pd


'''
side_by_side function from Wes McKinney, author of Pandas
if using python3, see this link for error you may get relating to the adjoin function
https://stackoverflow.com/questions/38156965/pandas-cannot-import-name-adjoin
'''
def side_by_side(*objs, **kwds):
  from pandas.io.formats.printing import adjoin
  space = kwds.get('space', 4)
  reprs = [repr(obj).split('\n') for obj in objs]
  print(adjoin(space, *reprs))



def main():
    print('[*] You are running pandas version {}'.format(pd.__version__))
    print('[*] You are running numpy version {}'.format(np.__version__))

    # Create a panda series with 6 random values between 1 and 100 from numpy
    rand_series = pd.Series(np.random.randint(1,100,6), index=['rand0', 'rand1', 'rand2', 'rand3', 'rand4', 'rand5'], name='Rand Series')
    rand_series.index.name = 'Rand Value'
    print('\n[*] \n{}'.format(rand_series))

    # We can also create a series from a python list
    list_series = pd.Series([98, 80, -50, 70, -10, 15], index=['numA','numB', 'numC', 'numD', 'numE', 'numF' ], name='list_series')
    list_series.index.name = 'List Value'
    print('\n[*] Current values in list_series \n{}'.format(list_series))

    # Create series from python dictionary, when used as below, all values were converted to Not a Number (NAN)
    #dict_series = pd.Series({10:1, 8:2, 11:4, 20:3, 5:9, 6:20}, index=['dict0', 'dict1', 'dict2', 'dict3', 'dict4', 'dict5'])

    # Use this instead
    dict_series = pd.Series({'dict0':1, 'dict1':2, 'dict2':4, 'dict3':3, 'dict4':9, 'dict5':20}, name='Dict Series')
    dict_series.index.name = 'Dict Value'
    print('\n[*] Current values in the dict_series \n {}'.format(dict_series))
    print()
    #Now call side_by_side function which was defined above
    print('[*] The 3 series side-by-side {} \n'.format(side_by_side(rand_series, list_series, dict_series)))

    #print the value of a specific index using its string name for a label based lookup
    print('\n[*] The value of numB is:{}\n '.format(list_series['numB']))

    #print the value of a specific location based on label based lookup
    print('\n[*] The value of rand1 is: \n{} '.format(rand_series.loc[['rand1', 'rand5']]))

    #for position based lookup used iloc
    print('\n[*] The value of numE is:{}\n '.format(list_series.iloc[5]))

    #We can use the .ix if we are not sure about the label or the position. It tries to lookup labels first then positions
    print('\n[*] The value of rand2 is:\n{} '.format(rand_series.ix[['rand2', 'rand3']]))
    


if __name__ == '__main__':
    main()


References
NumPy Reference
pandas: powerful Python data analysis toolkit
Pandas Series

Posts in this series:
Beginning Numpy
Beginning Pandas
Pandas String Operations, etc.

1 comment: