Hi all, I’ve made my tutorial and posted it to my website. You can find it here.
Let me know what you think!
3 Replies to “Post 9 – R Studio tutorial”
First of all, I really like your detailed explanation of installing R studio because some tutorials assume that you already have software installed. When you plot data, could you also draw a fit line or anything that illustrates relationships between variables?
Your question is a good one, and I hope my answer is satisfactory. When we plot two variables against each other, there exist a few different lines of best fit that can be calculated in different ways. The method that I can explain the best is the OLS technique, which creates a line of best fit by minimizing the sum of the squared vertical distance between the observations the the hypothetical line. In that way, the line approximates the linear slope of the spread of points.
To make this line appear on the plot you’ve created, we’ll use a function called abline().
abline() can be used to create vertical lines using the parameter v = [x_value], horizontal lines using the parameter h = [y_value], or lines that use slope-intercept form with the parameters a = [y_intercept] and b = [slope]. The abline() function also allows you to plot the estimated line of regressions, which is the technique I would use in this scenario. To do that, first remember that the command creating our plot is:
plot(weight ~ hp, data = cars2004)
From there, we’ll create a simple OLS model of weight ~ hp:
fit <- lm(weight ~ hp, data = cars2004)
And we’ll add the line suggested by that model to our plot:
First of all, I really like your detailed explanation of installing R studio because some tutorials assume that you already have software installed. When you plot data, could you also draw a fit line or anything that illustrates relationships between variables?
Hi Tom,
Your question is a good one, and I hope my answer is satisfactory. When we plot two variables against each other, there exist a few different lines of best fit that can be calculated in different ways. The method that I can explain the best is the OLS technique, which creates a line of best fit by minimizing the sum of the squared vertical distance between the observations the the hypothetical line. In that way, the line approximates the linear slope of the spread of points.
To make this line appear on the plot you’ve created, we’ll use a function called abline().
abline() can be used to create vertical lines using the parameter v = [x_value], horizontal lines using the parameter h = [y_value], or lines that use slope-intercept form with the parameters a = [y_intercept] and b = [slope]. The abline() function also allows you to plot the estimated line of regressions, which is the technique I would use in this scenario. To do that, first remember that the command creating our plot is:
plot(weight ~ hp, data = cars2004)
From there, we’ll create a simple OLS model of weight ~ hp:
fit <- lm(weight ~ hp, data = cars2004)
And we’ll add the line suggested by that model to our plot:
abline(fit, col = “red”)
I hope this answers your question!
Very well thought out and presented. I feel like I could effectively use this program after reading your tutorial. Great Job.