--- title: pandas DataFrame --- ## DataFrame In this section you will have a detailed look on the other important data-type of pandas "DataFrame". In pandas, DataFrame is used as an object to represent multi-dimensional data. They are mainly used to represent 2 dimensional or tabular data with rows and columns. They can also be called as collection of `Series`. DataFrame also supports 3 dimensional data using the multi index properties. It will be the replacement for the old and now existing `panel` object. A 3-dimensional DataFrame can consist of multiple 2-D DataFrame. ### Basic syntax of DataFrame ```python pandas.DataFrame(data=None, index=None, columns=None, dtype=None, copy=False) ``` `Data` : ndarray, dict,Series, another dataframe `index` : array-like or index. default to RangeIndex(1,2,3 . . n). Represents the index label. `columns` : array-like or index. default to RangeIndex(1,2,3 . . n). Represents the column label. `dtype` : dtype, default None. Data type of the DataFrame ### Creating DataFrame in different ways: As a first step import our pandas module: ```python import pandas as pd ``` ### Create an empty DataFrame: ```python df = pd.DataFrame() print(df) ``` Empty DataFrame Columns: [] Index: [] ### Create using a list: ```python input_data = [['Mark',87],['Tom',78],['Monika',97]] df = pd.DataFrame(data = input_data) print(df) ``` 0 1 0 Mark 87 1 Tom 78 2 Monika 97 ```python print('DataFrame with column and row index labels:') roll_no = [223,224,225] df = pd.DataFrame(data= input_data,index= roll_no, columns=['Name','Score']) print(df) ``` DataFrame with column and row index labels: Name Score 223 Mark 87 224 Tom 78 225 Monika 97 ### Create using a dict: ```python input_data = {'Name': ['Mark','Tom','Monika'], 'Score': [87,78,97]} df =pd.DataFrame(data=input_data,dtype= float) # Notice that the score is change to float. print(df) ``` Name Score 0 Mark 87.0 1 Tom 78.0 2 Monika 97.0 ### Create using a list of dict: ```python input_data = [{'Name': 'Mark','Score': 87}, {'Name': 'Tom','Score': 78}, {'Name': 'Monika', 'Score': 97}] df = pd.DataFrame(data= input_data, index=roll_no) print(df) ``` Name Score 223 Mark 87 224 Tom 78 225 Monika 97 ### Create using a dict of Series: ```python input_data = {'Name': pd.Series(['Mark','Tom','Monika','John']), 'Score': pd.Series([87,78,97])} df = pd.DataFrame(input_data) print(df) ``` Name Score 0 Mark 87.0 1 Tom 78.0 2 Monika 97.0 3 John NaN You can notice the above output, For John the score is NaN(not a number). In pandas empty values are defaulted with numpy.nan. ### DataFrame Manipulations: Now that you have a comprehensive idea on how to create a DataFrame and different kind of inputs you can use to create it. Next on to different manipulation operations we can do with a DataFrame. ### Column Manipulation: Below are the operations on the column level discussed here: * Column selection * Column addition * Column deletion ```python score_sheet = {'Name': pd.Series(['Mark','Tom','Monika','Lilly','Sam']), 'Maths': pd.Series([89,87,83,78,77]), 'Science': pd.Series([78,88,66,0,88])} DF = pd.DataFrame(score_sheet) print(DF) ``` Name Maths Science 0 Mark 89 78 1 Tom 87 88 2 Monika 83 66 3 Lilly 78 0 4 Sam 77 88 ### Column selection: ```python DF['Name'] # Selcting a particular column ``` 0 Mark 1 Tom 2 Monika 3 Lilly 4 Sam Name: Name, dtype: object ```python type(DF['Maths']) ``` pandas.core.series.Series You can notice that each column in a DataFrame is considered as a Series and it supports all the Series type operations. Example:` ```python DF['Maths'].max() # Finding the max score in maths ``` 89 ```python math = DF[['Name','Maths']] # Selcting multiple column print(math) ``` Name Maths 0 Mark 89 1 Tom 87 2 Monika 83 3 Lilly 78 4 Sam 77 ### Column addition: ```python DF['English'] = pd.Series([88,89,98,88,0]) # Adding a new subject English. print(DF) ``` Name Maths Science English 0 Mark 89 78 88 1 Tom 87 88 89 2 Monika 83 66 98 3 Lilly 78 0 88 4 Sam 77 88 0 ```python DF['Total Score'] = DF['Maths'] + DF['Science'] + DF['English'] print(DF) ``` Name Maths Science English Total Score 0 Mark 89 78 88 255 1 Tom 87 88 89 264 2 Monika 83 66 98 247 3 Lilly 78 0 88 166 4 Sam 77 88 0 165 ### Column deletion: ```python #Using the del function: del DF['English'] print(DF) ``` Name Maths Science Total Score 0 Mark 89 78 255 1 Tom 87 88 264 2 Monika 83 66 247 3 Lilly 78 0 166 4 Sam 77 88 165 ```python #Using the pop method: DF.pop('Total Score') print(DF) ``` Name Maths Science 0 Mark 89 78 1 Tom 87 88 2 Monika 83 66 3 Lilly 78 0 4 Sam 77 88 ### Row Manipulation: As like column, `DataFrame` have the similar operations for rows as well. Now you will see about those operations in row level in detail. You will use the same `DataFrame` DF we have created before. ### Row selection: There are two method availabel in DataFrame for selection. They are .iloc() and .loc(). * .iloc() method is used to select based on position. * loc() method is used to select based on the label value. Now we will see about the .iloc() method. ```python DF.iloc[2] #retruns the 2nd row. ``` Name Monika Maths 83 Science 66 Name: 2, dtype: object ```python type(DF.iloc[3]) ``` pandas.core.series.Series The important thing to notice here is that it returns a series again. Not just the column is retruned as a Series , rows as well. ```python print(DF[2:4]) # Sliceing the rows ``` Name Maths Science 2 Monika 83 66 3 Lilly 78 0 ### Row addition: ```python new_student = pd.DataFrame(data = [['Ben',79,89]], columns=['Name','Maths','Science'], index=[5]) DF = DF.append(new_student) # Using the append method added a new column. print(DF) ``` Name Maths Science 0 Mark 89 78 1 Tom 87 88 2 Monika 83 66 3 Lilly 78 0 4 Sam 77 88 5 Ben 79 89 ### Row deletion: ```python # We delete using the drop method and we use index label for deleting: DF.drop(3) print(DF) ``` Name Maths Science 0 Mark 89 78 1 Tom 87 88 2 Monika 83 66 3 Lilly 78 0 4 Sam 77 88 5 Ben 79 89 #### More Information: [DataFrame](http://pandas.pydata.org/pandas-docs/version/0.23.4/generated/pandas.DataFrame.html)