Applying a function to all rows of a column with pandas

A common operation to implement with a pandas dataframe is to run a function for each entry or row of a column. Below shows how we can do this using a simple custom function.

First we’ll put together a dataframe - remember that you could read your own in from file using for example pandas.read_csv() - incidentally, we’ll need to import pandas and, just for this example, numpy too:

import pandas as pd
import numpy as np

# Create a dataframe with a dictionary (could just read one in too)
input_data={'var1':np.random.randint(low=-50, high=50, size=10),
			'var2':np.random.randint(low=-50, high=50, size=10)}
data=pd.DataFrame(data=input_data)

# Have a look at the data frame you've created
data.head()

Now let’s make the function that we want to apply to each entry of a specified column:

# Create a function to apply to each row of the data frame
def negative_clean_up(value):
	"""Converts all negative values to positive and divides by 2
	"""
	if value<0:
		return(abs(value)/2)
	else:
		return(value)

The final step is to apply the function to a specific column - remember that to save the changes to the dataframe variable, you’ll need to assign it (i.e. column name = whatever…). Rather than writing a loop that goes through each row, the function pandas.DataFrame.apply() will do all of the work for us:

# Apply that function to every row of the column
data['var1']=data['var1'].apply(negative_clean_up)

# Check the data output
data.head()

If you want to apply it to all columns, you can use the function applymap():

data.applymap(lambda x: negative_clean_up(x))

To read more about the lambda function, have a read here.

Written on October 17, 2019