For most data scientists who use Python and R in their day-to-day lives and would like to use complementing capabilities of these two languages together (maybe, use some R plotting and stats packages in Python), this post will help to understand how to incorporate R functionality into your Python workflow.
rpy2
is a Python module which offers an interface to run embedded R in a Python process. rpy2
module provides two interfaces: a low-level interface (rpy2.rinterface) and a high-level interface (rpy2.robjects). We will use high-level interface, rpy2.robjects.
First we will set up/install rpy2 module and then perform some basic tasks. We will use Spyder, an IDE for Python for perfoming these tasks.
rpy2
Setup/Installation:Download the ‘rpy2-2.8.6-cp36-cp36m-win_amd64.whl’ (for Python 3.6 and 64 bit machine) from https://www.lfd.uci.edu/~gohlke/pythonlibs/ Choose the right whl file according to your system’s configuration and Python version.
Open Spyder and execute below commands for installing rpy2
:
import pip
pip.main(["install", "path to rpy2-2.8.6-cp36-cp36m-win_amd64.whl"])
“path to rpy2-2.8.6-cp36-cp36m-win_amd64.whl” for eg: “C:/Users/Dell/Downloads/rpy2-2.8.6-cp36-cp36m-win_amd64.whl”
Note: If there are any packages of R which you want to use in rpy2 interface, you will have to first download those packages in R and then set the environment variables in Python.
import os
os.environ['R_HOME'] = 'path to R installation'
os.environ['R_USER'] = 'path to rpy2 installation'
‘path to R installation’ for eg: ‘C:/Program Files/R/R-3.3.0’
‘path to rpy2 installation’ for eg: ‘C:/Users/Dell/Anaconda3/Lib/site-packages/rpy2’
Now we have the interface ready for calling R from Python.
import rpy2.robjects as ro
from rpy2.robjects import pandas2ri
pandas2ri.activate()
ro.r('salary_data<-read.csv("basic_salary.csv")')
ro.r('head(salary_data)')
pandas2ri
automatically converts R data frames to pandas data frames. It requires pandas conversion to be activated. For that we use activate()
.
R commands will be enclosed in ro.r()
. r
is the instance of R and can be seen as the entry point to an embedded R process.
Note: For importing data, first locate your data file, whether it is saved in the default working directory or any other location in your system. If it is not stored in default working directory then you will have to give its path for importing it.
1 Alan Brown GR1 DELHI 17990 16070
2 Agatha Williams GR2 MUMBAI 12390 6630
3 Rajesh Kolte GR1 MUMBAI 19250 14960
4 Ameet Mishra GR2 DELHI 14780 9300
5 Neha Rao GR1 MUMBAI 19235 15200
ro.r('dim(salary_data)')
array([12, 6], dtype=int32)
ro.r('A<-aggregate(ms~Location,data=salary_data,FUN=sum)')
Location ms
1 DELHI 45430
2 MUMBAI 75620