stargazer

Explore the universe of insights: A guide to Stargazer() with R

Introduction

This function is part of the stargazer package, providing an easy way to create tables of regression results, making it an invaluable tool for data scientists, researchers, and anyone analyzing statistical models. The Stargazer function presents the results of multiple statistical models side-by-side in comprehensive and well structured format for ease of analysis.

In this guide we are going to explore 2 usages of startgazer() i.e.:

Summary Statistics
Output table of Regression results

Let us begin with Installing and loading the required package

install.packages("stargazer")
library(stargazer)

About the dataset

This a Basic Health Insurance Data where every row states details of one policyholder. Where “Charges” is our target variable, it is the Premium Charged to the Policyholder

  age gender    bmi children smoker    region   charges
1  19 female 27.900        0    yes southwest 16884.924
2  18   male 33.770        1     no southeast  1725.552
3  28   male 33.000        3     no southeast  4449.462
4  33   male 22.705        0     no northwest 21984.471
5  32   male 28.880        0     no northwest  3866.855
6  31 female 25.740        0     no southeast  3756.622

1) Getting the Summary Statistics with “stargazer” function:

Stargazer function will by default give summary statistics for Numeric variables in your dataset.

stargazer(insurance_df, type = "text", out="Summary Statistics.txt", title = "Table 1:Summary statistics", covariate.labels=c("Age","Body Mass Index","Number of Children","Charges"))


Table 1:Summary statistics
===================================================================
Statistic            N      Mean     St. Dev.     Min       Max    
-------------------------------------------------------------------
Age                1,338   39.207     14.050      18         64    
Body Mass Index    1,338   30.663     6.098     15.960     53.130  
Number of Children 1,338   1.095      1.205        0         5     
Charges            1,338 13,270.420 12,110.010 1,121.874 63,770.430
-------------------------------------------------------------------

Customization and Options:

The stargazer function provides various options for customizing the appearance of the table. You can adjust the alignment and modify the output. We have used some of the arguments as explained below:

insurance_df : Replace with your data frame, you can also display subset of columns or rows e.g.: insurance_df[c(“age”,“bmi”)] or insurance_df[insurance_df$age>30,] respectively
type = “text”: Output type, it can be either LaTeX, HTML or Text. Default format is LaTeX
out=“Summary Statistics.txt”: Name for exported file (Once the command is executed, it will create a file with mentioned filename in the current working directory, which can be then imported in Word file for documentation)
title = “Table 1:Summary statistics”: Title for your table
covariate.labels =c(….): Used to replace the variable names with reader friendly labels
We can also use the “flip=TRUE” parameter, if we want the variables in columns (Transposed table)

Note: if you are using LaTeX or HTML format for Quarto documentation, you will need to add an argument i.e.: ```{r, insurance_summary, results=‘asis’} to view the result in table format, otherwise it will just display it in code format.

Creating models

#Linear regression model
model1_lm<-lm(charges ~ age, data=insurance_df)

#Multiple regression model
model2_lm<-lm(charges ~ age + gender + bmi + children + smoker+ region, data=insurance_df)

#Binary logistic regression model
insurance_df$high_low<-as.factor(ifelse(insurance_df$charges > 13000,1,0))
model3_glm<-glm(high_low ~ age + gender + bmi + children + smoker, data=insurance_df,family = "binomial")

2) Getting the output table of Regression results

Let us plot the above 3 models’ summary side-by-side by exploring some more arguments as mentioned below:

omit.stat=c(“ser”,“f”): used specify which statistics need to be removed from getting displayed. Example here we have removed, residual standard error (“ser”) and the F-statistic (“f”).
no.space=TRUE: Used to remove line spacing between the rows.
single.row=TRUE: used to specify whether you want the regression and standard errors (or confidence intervals) in the same row
dep.var.labels=c(….) and covariate.labels =c(….): can be used to rename the dependent variable(y) or independent variables(x) respectively

stargazer(model1_lm,model2_lm,model3_glm, type= "text", out = "Model_summary.txt",title="Summary of Regression models",omit.stat=c("ser","f"), no.space=TRUE, single.row=TRUE)


Summary of Regression models
===================================================================================
                                         Dependent variable:                       
                  -----------------------------------------------------------------
                                      charges                         high_low     
                                        OLS                           logistic     
                           (1)                     (2)                   (3)       
-----------------------------------------------------------------------------------
age                257.723*** (22.502)     256.856*** (11.899)    0.079*** (0.008) 
gendermale                                  -131.314 (332.945)    -0.361** (0.184) 
bmi                                        339.193*** (28.599)      0.008 (0.015)  
children                                   475.501*** (137.804)     0.075 (0.075)  
smokeryes                                23,848.530*** (413.153)  8.387*** (1.025) 
regionnorthwest                             -352.964 (476.276)                     
regionsoutheast                           -1,035.022** (478.692)                   
regionsouthwest                            -960.051** (477.933)                    
Constant          3,165.885*** (937.149) -11,938.540*** (987.819) -5.377*** (0.609)
-----------------------------------------------------------------------------------
Observations              1,338                   1,338                 1,338      
R2                        0.089                   0.751                            
Adjusted R2               0.089                   0.749                            
Log Likelihood                                                        -393.980     
Akaike Inf. Crit.                                                      799.960     
===================================================================================
Note:                                                   *p<0.1; **p<0.05; ***p<0.01

Here, the table displays the independent variables (x) across the rows while Dependent Variables(y) in columns. Each cell shows the coefficients and its respective standard error in parenthesis, the number of * indicates the significance. The statistics show observation number(count for each model), R-square and Adjusted R-square(for linear and multiple regression) and Log Likelihood(for Binary logistic regression)

Conclusion:

The stargazer function in R is a versatile tool for presenting statistical models with elegance and precision. As we explore the universe of data, this function serves as our guiding star, illuminating the path to clearer insights and more informed decisions. May your statistical models shine brightly, just like the stars in the night sky!