For most data scientists who use Python and R in their day-to-day lives and would like to use complementing capabilities of these two languages together (maybe, use some R plotting and stats packages in Python), this post will help to understand how to incorporate R functionality into your Python workflow.

rpy2 is a Python module which offers an interface to run embedded R in a Python process. rpy2 module provides two interfaces: a low-level interface (rpy2.rinterface) and a high-level interface (rpy2.robjects). We will use high-level interface, rpy2.robjects.

First we will set up/install rpy2 module and then perform some basic tasks. We will use Spyder, an IDE for Python for perfoming these tasks.

Step 1:rpy2 Setup/Installation:

Download the ‘rpy2-2.8.6-cp36-cp36m-win_amd64.whl’ (for Python 3.6 and 64 bit machine) from https://www.lfd.uci.edu/~gohlke/pythonlibs/ Choose the right whl file according to your system’s configuration and Python version.

Open Spyder and execute below commands for installing rpy2:

import pip
pip.main(["install", "path to rpy2-2.8.6-cp36-cp36m-win_amd64.whl"])

“path to rpy2-2.8.6-cp36-cp36m-win_amd64.whl” for eg: “C:/Users/Dell/Downloads/rpy2-2.8.6-cp36-cp36m-win_amd64.whl”

Step 2: Set up the environment variables:

Note: If there are any packages of R which you want to use in rpy2 interface, you will have to first download those packages in R and then set the environment variables in Python.

import os
os.environ['R_HOME'] = 'path to R installation'
os.environ['R_USER'] = 'path to rpy2 installation'

‘path to R installation’ for eg: ‘C:/Program Files/R/R-3.3.0’
‘path to rpy2 installation’ for eg: ‘C:/Users/Dell/Anaconda3/Lib/site-packages/rpy2’

Now we have the interface ready for calling R from Python.

Import data:

import rpy2.robjects as ro
from rpy2.robjects import pandas2ri
pandas2ri.activate()
ro.r('salary_data<-read.csv("basic_salary.csv")')
ro.r('head(salary_data)')

pandas2ri automatically converts R data frames to pandas data frames. It requires pandas conversion to be activated. For that we use activate().
R commands will be enclosed in ro.r(). r is the instance of R and can be seen as the entry point to an embedded R process.
Note: For importing data, first locate your data file, whether it is saved in the default working directory or any other location in your system. If it is not stored in default working directory then you will have to give its path for importing it.

1       Alan     Brown   GR1    DELHI  17990  16070
2     Agatha  Williams   GR2   MUMBAI  12390   6630
3     Rajesh     Kolte   GR1   MUMBAI  19250  14960
4      Ameet    Mishra   GR2    DELHI  14780   9300
5       Neha       Rao   GR1   MUMBAI  19235  15200

Check the dimension of salary_data:

ro.r('dim(salary_data)')
array([12,  6], dtype=int32)

Add a new variable ‘bonus’ to salary_data containing values as 5% of ba:

ro.r('A<-aggregate(ms~Location,data=salary_data,FUN=sum)')
  Location     ms
1    DELHI  45430
2   MUMBAI  75620