For those who are learning R and may be well-versed in SQL, package sqldf is very useful because it enables us to use SQL commands in R..One who has basic SQL skills can manipulate data frames in R using their SQL skills.
The main function in the package sqldf is sqldf(), which takes a single argument which is an SQL select statement.

Let us see how we can run SQL queries in R usingsqldf() function.

#Install and load sqldf package
install.packages('sqldf')
library(sqldf)
#Import basic_salary data
salary_data<-read.csv("basic_salary.csv", header=T)
head(salary_data, n=3)
##   First_Name Last_Name Grade Location    ba    ms
## 1       Alan     Brown   GR1    DELHI 17990 16070
## 2     Agatha  Williams   GR2   MUMBAI 12390  6630
## 3     Rajesh     Kolte   GR1   MUMBAI 19250 14960

Subsetting Data:

Display columns ‘First_Name’ and ‘ba’ of salary_data

# Column Subsetting
subcols<-"select First_Name,ba from salary_data"
subset_cols<-sqldf(subcols)
subset_cols
##    First_Name    ba
## 1        Alan 17990
## 2      Agatha 12390
## 3      Rajesh 19250
## 4       Ameet 14780
## 5        Neha 19235
## 6       Sagar 13390
## 7       Aaron 23280
## 8        John 13500
## 9       Sneha 20660
## 10     Gaurav 13760
## 11      Adela 13660
## 12       Anup 11960

Aggregating Data:

Calculate sum of variable ‘ba’ by variable ‘Location’

agg_sum<-"select Location,sum(ba) as sum_of_ba from salary_data group by Location"
aggdf<-sqldf(agg_sum)
aggdf
  Location sum_of_ba
1    DELHI     80850
2   MUMBAI    113005

You can use the aggregation functions like sum(), count(), avg() with select statement.
as clause is used to change the name of the column in the output.
group by clause is used with select statement for aggregation in SQL.

Merging Data:

Currently, sqldf does not support right joins or full outer joins. Let’s perform left and inner join on sal_data and bonus_data.

#Import sal_data and bonus_data
sal_data<-read.csv("sal_data.csv", header=T)
bonus_data<-read.csv("bonus_data.csv", header=T)
head(sal_data, n=3)
##   Employee_ID First_Name Last_Name Basic_Salary
## 1      E-1001     Mahesh     Joshi        16860
## 2      E-1002     Rajesh     Kolte        14960
## 3      E-1004      Priya      Jain        12670
head(bonus_data, n=3)
##   Employee_ID Bonus
## 1      E-1001 16070
## 2      E-1003 15200
## 3      E-1004 13490
# Left Join
leftjoin_string<-"select sal_data.*,bonus_data.Bonus from sal_data left join bonus_data on sal_data.Employee_ID = bonus_data.Employee_ID"

sal_join_bonus1<-sqldf(leftjoin_string)
sal_join_bonus1
##   Employee_ID First_Name Last_Name Basic_Salary Bonus
## 1      E-1001     Mahesh     Joshi        16860 16070
## 2      E-1002     Rajesh     Kolte        14960    NA
## 3      E-1004      Priya      Jain        12670 13490
## 4      E-1005      Sneha     Joshi        15660    NA
## 5      E-1007        Ram    Kanade        15850    NA
## 6      E-1008      Nishi    Honrao        15950 15880
## 7      E-1009     Hameed     Singh        15120    NA
# Inner Join
innerjoin_string<-"select sal_data.*, bonus_data.Bonus from sal_data inner join bonus_data on sal_data.Employee_ID = bonus_data.Employee_ID"

sal_join_bonus2<-sqldf(innerjoin_string)
sal_join_bonus2
##   Employee_ID First_Name Last_Name Basic_Salary Bonus
## 1      E-1001     Mahesh     Joshi        16860 16070
## 2      E-1004      Priya      Jain        12670 13490
## 3      E-1008      Nishi    Honrao        15950 15880

Note: basic_salary, sal_data and bonus_data are csv files, saved in working directory. If your data is not saved in working directory then specify the path while importing it.