install.packages("stargazer")
library(stargazer)
Explore the universe of insights: A guide to Stargazer() with R
Introduction
This function is part of the stargazer package, providing an easy way to create tables of regression results, making it an invaluable tool for data scientists, researchers, and anyone analyzing statistical models. The Stargazer function presents the results of multiple statistical models side-by-side in comprehensive and well structured format for ease of analysis.
In this guide we are going to explore 2 usages of startgazer() i.e.:
Summary Statistics
Output table of Regression results
Let us begin with Installing and loading the required package
About the dataset
This a Basic Health Insurance Data where every row states details of one policyholder. Where “Charges” is our target variable, it is the Premium Charged to the Policyholder
age gender bmi children smoker region charges
1 19 female 27.900 0 yes southwest 16884.924
2 18 male 33.770 1 no southeast 1725.552
3 28 male 33.000 3 no southeast 4449.462
4 33 male 22.705 0 no northwest 21984.471
5 32 male 28.880 0 no northwest 3866.855
6 31 female 25.740 0 no southeast 3756.622
1) Getting the Summary Statistics with “stargazer” function:
Stargazer function will by default give summary statistics for Numeric variables in your dataset.
stargazer(insurance_df, type = "text", out="Summary Statistics.txt", title = "Table 1:Summary statistics", covariate.labels=c("Age","Body Mass Index","Number of Children","Charges"))
Table 1:Summary statistics
===================================================================
Statistic N Mean St. Dev. Min Max
-------------------------------------------------------------------
Age 1,338 39.207 14.050 18 64
Body Mass Index 1,338 30.663 6.098 15.960 53.130
Number of Children 1,338 1.095 1.205 0 5
Charges 1,338 13,270.420 12,110.010 1,121.874 63,770.430
-------------------------------------------------------------------
Customization and Options:
The stargazer function provides various options for customizing the appearance of the table. You can adjust the alignment and modify the output. We have used some of the arguments as explained below:
insurance_df : Replace with your data frame, you can also display subset of columns or rows e.g.: insurance_df[c(“age”,“bmi”)] or insurance_df[insurance_df$age>30,] respectively
type = “text”: Output type, it can be either LaTeX, HTML or Text. Default format is LaTeX
out=“Summary Statistics.txt”: Name for exported file (Once the command is executed, it will create a file with mentioned filename in the current working directory, which can be then imported in Word file for documentation)
title = “Table 1:Summary statistics”: Title for your table
covariate.labels =c(….): Used to replace the variable names with reader friendly labels
We can also use the “flip=TRUE” parameter, if we want the variables in columns (Transposed table)
Note: if you are using LaTeX or HTML format for Quarto documentation, you will need to add an argument i.e.: ```{r, insurance_summary, results=‘asis’} to view the result in table format, otherwise it will just display it in code format.
Creating models
#Linear regression model
<-lm(charges ~ age, data=insurance_df)
model1_lm
#Multiple regression model
<-lm(charges ~ age + gender + bmi + children + smoker+ region, data=insurance_df)
model2_lm
#Binary logistic regression model
$high_low<-as.factor(ifelse(insurance_df$charges > 13000,1,0))
insurance_df<-glm(high_low ~ age + gender + bmi + children + smoker, data=insurance_df,family = "binomial") model3_glm
2) Getting the output table of Regression results
Let us plot the above 3 models’ summary side-by-side by exploring some more arguments as mentioned below:
omit.stat=c(“ser”,“f”): used specify which statistics need to be removed from getting displayed. Example here we have removed, residual standard error (“ser”) and the F-statistic (“f”).
no.space=TRUE: Used to remove line spacing between the rows.
single.row=TRUE: used to specify whether you want the regression and standard errors (or confidence intervals) in the same row
dep.var.labels=c(….) and covariate.labels =c(….): can be used to rename the dependent variable(y) or independent variables(x) respectively
stargazer(model1_lm,model2_lm,model3_glm, type= "text", out = "Model_summary.txt",title="Summary of Regression models",omit.stat=c("ser","f"), no.space=TRUE, single.row=TRUE)
Summary of Regression models
===================================================================================
Dependent variable:
-----------------------------------------------------------------
charges high_low
OLS logistic
(1) (2) (3)
-----------------------------------------------------------------------------------
age 257.723*** (22.502) 256.856*** (11.899) 0.079*** (0.008)
gendermale -131.314 (332.945) -0.361** (0.184)
bmi 339.193*** (28.599) 0.008 (0.015)
children 475.501*** (137.804) 0.075 (0.075)
smokeryes 23,848.530*** (413.153) 8.387*** (1.025)
regionnorthwest -352.964 (476.276)
regionsoutheast -1,035.022** (478.692)
regionsouthwest -960.051** (477.933)
Constant 3,165.885*** (937.149) -11,938.540*** (987.819) -5.377*** (0.609)
-----------------------------------------------------------------------------------
Observations 1,338 1,338 1,338
R2 0.089 0.751
Adjusted R2 0.089 0.749
Log Likelihood -393.980
Akaike Inf. Crit. 799.960
===================================================================================
Note: *p<0.1; **p<0.05; ***p<0.01
Here, the table displays the independent variables (x) across the rows while Dependent Variables(y) in columns. Each cell shows the coefficients and its respective standard error in parenthesis, the number of * indicates the significance. The statistics show observation number(count for each model), R-square and Adjusted R-square(for linear and multiple regression) and Log Likelihood(for Binary logistic regression)
Conclusion:
The stargazer function in R is a versatile tool for presenting statistical models with elegance and precision. As we explore the universe of data, this function serves as our guiding star, illuminating the path to clearer insights and more informed decisions. May your statistical models shine brightly, just like the stars in the night sky!