15 Most-used Python Commands for Data Cleaning

Python (and especially Pandas) is one of my most used packages at work as a Marketing Analyst. Although I also use SciPy and Scikit‑learn for statistical modeling needs, we all know over 50% of an analyst’s time is spent on data cleaning, where the 15 command lines below come in handy.

1.Get current directory

import os

os.getcwd()

2. Show the first two rows of a data frame

df.head(2)

3. Convert column names to list to view

df.columns.tolist()

4. Drop NA from a column

df.A=df.A.dropna()

5. Drop 2 columns from a data frame

df.drop(['A','B'],axis=1)

6. Format date

import pandas as pd

df[‘date’]=pd.to_datetime(df[‘date’],format=”%Y%m”)

7. Count the unique number of rows in column A

df[‘A’].nunique()

8. Count the number of each value in column A

df[‘A’].value_counts()

9. Remove rows where values in df column A is in df_b column A

df=df[-df[‘A’].isin(df_b[‘A’])]

10. Create a data frame grouped by column A and count unique values of the Email column

import pandas as pd

pd.DataFrame(df.groupby([‘A’])[‘Email’].nunique())

11. Replace values in a column

replace_values = {‘Small’ : 1, ‘Medium’ : 2, ‘High’ : 3 }

df = df.replace({'A': replace_values})

12. Append 3 data frames

append_data=open.append([clicks,delivers,not_delivered])

13. Turn a data frame into a pivot table

pd.pivot_table(df, index=[‘A’],columns=’B’, values=’Emails’)

14. Flatten a pivot table

import pandas as pd

pd.melt(df, id_vars=['A'])

15. Join 2 data frames based by merging a common column

import pandas as pd

pd.merge(df1,df2,how=’Outer’,left_on=[‘domain’],right_on=[‘domain’])