R is available for Linux, MacOS, and Windows.
Also, personally recommend to use RStudio as R code editor. Please be sure to have R installed first because RStudio itself does not contain R.
R is a dialect of the S language. It is a case-sensitive, interpreted language. You can enter commands one at a time at the command prompt (>) or run a set of commands from a source file.
After opening R, make sure you are under a correct working directory.Note that the path in R is NOT in the same format as usually shown in WINDOWS system..
getwd()
setwd("Drive:/path/to/your/working/directory")
First, to open or create scripts, click File – New script/Open script. Then type your code in the Script window. To run some code, coding – select lines – Ctrl+R or coding – select lines – right-click – click Run line.
Install packages we need.
if(!require(alr4)){
install.packages("alr4")
}
library(alr4)
We always have official manual for reference. Try the following commands for assistance from built-in help system.
help.start() # general help
help(factorial) # help about function factorial
?factorial # same thing
apropos("factorial") # list all functions containing string factorial
example(factorial) # show an example of function factorial
To import data, we have different functions to deal with different situations.
data(package = "alr4")
data(ais)
data <- read.table("path/to/your/dataset", header = TRUE)
header=TRUE
indicates that the file contains the names of the variables as its first line. By default, header=FALSE in read.table(). Also, take a look at function read.csv
for comma separated file(.csv).
Using the colon :
operator, seq
, c
, and rep
4:1
## [1] 4 3 2 1
seq(1, 4, by = 0.5)
## [1] 1.0 1.5 2.0 2.5 3.0 3.5 4.0
seq(1, 4, length.out = 6)
## [1] 1.0 1.6 2.2 2.8 3.4 4.0
c(1,3,5,6)
## [1] 1 3 5 6
rep(1:3, times=2)
## [1] 1 2 3 1 2 3
rep(1:3, each=2)
## [1] 1 1 2 2 3 3
rbind(1:3, 4:6)
## [,1] [,2] [,3]
## [1,] 1 2 3
## [2,] 4 5 6
cbind(1:2, 3:4)
## [,1] [,2]
## [1,] 1 3
## [2,] 2 4
(mat1 <- matrix(1:6, nrow = 3, ncol = 2))
## [,1] [,2]
## [1,] 1 4
## [2,] 2 5
## [3,] 3 6
(mat2 <- matrix(1:6, nrow = 3, ncol = 2, byrow = TRUE))
## [,1] [,2]
## [1,] 1 2
## [2,] 3 4
## [3,] 5 6
mat1 + 1 #broadcasting
## [,1] [,2]
## [1,] 2 5
## [2,] 3 6
## [3,] 4 7
mat1^2
## [,1] [,2]
## [1,] 1 16
## [2,] 4 25
## [3,] 9 36
mat1 + mat2
## [,1] [,2]
## [1,] 2 6
## [2,] 5 9
## [3,] 8 12
mat1 * mat2
## [,1] [,2]
## [1,] 1 8
## [2,] 6 20
## [3,] 15 36
mat1 %*% t(mat2)
## [,1] [,2] [,3]
## [1,] 9 19 29
## [2,] 12 26 40
## [3,] 15 33 51
x=seq(1,20,by=2) # a vector (1,3,5,7,9,11,13,15,17,19)
x[1] # 1st entry: 1
## [1] 1
x[2:4] # 2nd-4th entries: 3 5 7
## [1] 3 5 7
x[c(1,3,8,9)] # 1st,3rd,8th & 9th entries: 1 5 15 17
## [1] 1 5 15 17
x[-1] # entries excluding 1st: 3 5 7 9 11 13 15 17 19
## [1] 3 5 7 9 11 13 15 17 19
x>5&x<10 # a logical vector (F,F,F,T,T,F,F,F,F,F)
## [1] FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
x[x>5&x<10] # entries of x where condition is TRUE: 7 9
## [1] 7 9
G=matrix(1:12,nrow=4,byrow=T) # a 4X3 matrix with 1:12 arranged by row
G[3,2] # (3,2) entry: 8
## [1] 8
G[4,] # 4th row: 10 11 12
## [1] 10 11 12
G[,1] # 1st column: 1 4 7 10
## [1] 1 4 7 10
G[c(2,4),3]
## [1] 6 12
Length of an object: length(x)
.
Sum of values in a vector: sum(x)
.
Dimension of a matrix: dim(x)
.
Sample mean: mean(x)
.
Median: median(x)
.
Sample quantiles: quantile(x, probs=)
.
Sample variance / covariance matrix: var(x)
.
Sample covariance of x and y: cov(x,y)
.
Sample correlation between x and y: cor(x,y)
.
Standard deviation: sd(x)
.
Object summaries: summary(x)
.
Trace of a matrix: tr(x)
.
Transpose of a matrix: t(x)
.
Inverse of a matrix: solve(x)
.
pdf of Uniform(1,3) at 1.5: dunif(1.5,1,3)
.
0.95 quantile (0.05 upper quantile) of N(0,1): qnorm(0.95)
.
p-value of a chi-square statistic 3.84 with df=1: 1 - pchisq(3.84,1)
.
Generate 1000 random numbers from N(2,16): rnorm(1000,2,4)
.
FunctionName <- function(arg1, arg2, ...) {
Function body
}
Inside the function body, you could use syntax like if, if…else, for, while, break, next, return to construct your function.
my_add <- function(x, y){
z <- x+y
return(z)
}
my_add(3,4)
## [1] 7
The data set consists of variables blood pressure, age, weight and resting pulse rate. (1) Fit a linear regression model for blood pressure on age, weight and resting pulse rate. (2) Find the residual sum of squares RSS. (3) Find the coefficient of determination R2. (4) Test whether the effect of resting pulse rate is significant by constructing an ANOVA table.
dat2=read.csv("BloodPressure.csv")
y=dat2$Blood.pressure # extract the variables
x1=dat2$Age
x2=dat2$Weight
x3=dat2$Pulse
fit=lm(y ~ x1+x2+x3) # fit a linear model
summary(fit) # summary of the model
rss=sum(fit$residuals∧2)
rss
summary(fit)$r.squared
fit2=lm(y ~ x1+x2) # a new model without x3
anova(fit2,fit)
ubar <–sum(u)/length(u)
ubar
sum((u-ubar)∧2)/ (length(u)-1)
Show that applying the mean and var functions on u would provide the same results as in part (b).
Based on the results in(a), construct the matrix A in R with dimension 3*6.
Thanks for previous TA’s resources. This file refers to Ms. YIN Jie’s tutorial notes in fall, 2021↩︎