R - PROGRAMMING[BCS358B]

1. Demonstrate the steps for installation of R and R Studio. Perform the following:

a) Assign different type of values to variables and display the type of variable. Assign different types such as Double, Integer, Logical, Complex and Character and understand the difference between each data type.

b) Demonstrate Arithmetic and Logical Operations with simple examples.

c) Demonstrate generation of sequences and creation of vectors.

d) Demonstrate Creation of Matrices.

e) Demonstrate the Creation of Matrices from Vectors using Binding Function.

f) Demonstrate element extraction from vectors, matrices and arrays.

1) A) Assign different type of values to variables and display the type of variable. Assign different types such as Double, Integer, Logical, Complex and Character and understand the difference between each data type.

Numeric Data type in R

Program:

x = 5.6

print(class(x))

print(typeof(x))

Output:

Even if an integer is assigned to a variable y, it is still saved as a numeric value.

Program:

y = 5

print(class(y))

print(typeof(y))

Output:

Integer Data type in R

Program:

x = as.integer(5)

print(class(x))

print(typeof(x))

y = 5L

print(class(y))

print(typeof(y))

Output:

Logical Data type in R

Program:

x = 4

y = 3

z = x > y

print(class(z))

print(typeof(z))

Output:

Complex Data type in R

Program:

x = 4 + 3i

print(class(x))

print(typeof(x))

Output:

Character Data type in R

Program:

char = "Geeksforgeeks"

print(class(char))

print(typeof(char))

Output:

1) B) Demonstrate Arithmetic and Logical Operations with simple examples.

Arithmetic Operators

Program:

vec1 <- c(0, 2)

vec2 <- c(2, 3)

cat ("Addition of vectors :", vec1 + vec2, "\n")

cat ("Subtraction of vectors :", vec1 - vec2, "\n")

cat ("Multiplication of vectors :", vec1 * vec2, "\n")

cat ("Division of vectors :", vec1 / vec2, "\n")

cat ("Modulo of vectors :", vec1 %% vec2, "\n")

cat ("Power operator :", vec1 ^ vec2)

Output:

Logical Operators

Program:

vec1 <- c(0,2)

vec2 <- c(TRUE,FALSE)

cat ("Element wise AND :", vec1 & vec2, "\n")

cat ("Element wise OR :", vec1 | vec2, "\n")

cat ("Logical AND :", vec1 && vec2, "\n")

cat ("Logical OR :", vec1 || vec2, "\n")

cat ("Negation :", !vec1)

Output:

C) Demonstrate generation of sequences and creation of vectors.

Program:

vec1 <- seq(1, 10, by = 2)

vec2 <- seq(1, 10, length.out = 7)

print(vec1)

print(vec2)

Output:

D) Demonstrate Creation of Matrices.

Create a Matrix in R

Program:

# create a 2 by 3 matrix

matrix1 <- matrix(c(1, 2, 3, 4, 5, 6), nrow = 2, ncol = 3, byrow = TRUE) print(matrix1)

Output:

E) Demonstrate the Creation of Matrices from Vectors using Binding Function

Ø cbind() function

Ø rbind() function

Ø matrix() function

Program:

x <- c(1:5)

y <- c(11:15)

z <- c(21:25)

o <- matrix(c(x, y, z), ncol = 3)

m <- cbind(x, y, z)

n <- rbind(x, y, z)

print(o)

print(m)

print(n)

class(o)

class(m)

class(n)

Output:

2. Assess the Financial Statement of an Organization being supplied with 2 vectors of data: Monthly Revenue and Monthly Expenses for the Financial Year. You can create your own sample data vector for this experiment) Calculate the following financial metrics:

Program:

#Data

revenue <- c(14574.49, 7606.46, 8611.41, 9175.41, 8058.65, 8105.44, 11496.28, 9766.09,

10305.32, 14379.96, 10713.97, 15433.50)

expenses <- c(12051.82, 5695.07, 12319.20, 12089.72, 8658.57, 840.20, 3285.73, 5821.12,

6976.93, 16618.61, 10054.37, 3803.96) revenue

expenses

#profit per month profit <- revenue - expenses profit

#30% tax value

tax_30_per <- round(profit * 0.30, 0)

tax_30_per

#profit after tax

profit_after_tax <- profit - tax_30_per

profit_after_tax

#profit margin in %

profit.margin <- round(profit_after_tax/revenue, 2)*100

profit.margin <- paste(profit.margin,"%")

#best month

best_month <- max(profit_after_tax)

#worst month

worst_month <- min(profit_after_tax)

best_month

worst_month

mean_for_year <- mean(profit_after_tax)

mean_for_year

#sorting vector in ascending order

profit_sort_asc <- sort(profit_after_tax, decreasing = F)

for(i in profit_sort_asc){ if(i>mean_for_year){ good_month = i

break

}else{

bad_month = i

}

#good month

good_month

#bad month

bad_month

#csv print

data <- data.frame(revenue,expenses)

print(data)

write.csv(data,"F:BE_BIT\\display.csv")

print ('CSV file written Successfully :)')

Output:

Develop a program to create two 3 X 3 matrices A and B and perform the following operations a)Transpose of the matrix b) addition c) subtraction d) multiplication

Code:

# Create matrices A and B

A <- matrix(1:9, nrow = 3)

B <- matrix(9:1, nrow = 3)

# Display matrices A and B

print("Matrix A:")

print(A)

print("Matrix B:")

print(B)

# Transpose of the matrices

print("Transpose of Matrix A:")

print(t(A))

print("Transpose of Matrix B:")

print(t(B))

# Addition of matrices

print("Addition of A and B:")

print(A + B)

# Subtraction of matrices

print("Subtraction of A and B:")

print(A - B)

# Multiplication of matrices

print("Multiplication of A and B:")

print(A %*% B)

output

4. Develop a program to find the factorial of given number using recursive function calls.

What is factorial? How to find it using recursion?

Algorithm

STEP 1: Call function recur_fact()

STEP 2: Pass the number as num to function.

STEP 3: Check if the number > 1 or not, if yes do step 4 otherwise step5

Program:

recur_fact <- function(num) { if(num <= 1) { return(1)

} else {

return(num * recur_fact(num-1))

}

print(paste("The factorial of 10 is",recur_fact (10)))

Output:

5. Develop an R Program using functions to find all the prime numbers up to a specified number by the method of Sieve of Eratosthenes.

Program:

prime_numbers <- function(n) {

if (n >= 2) {

x = seq(2, n)

prime_nums = c()

for (i in seq(2, n)) {

if (any(x == i)) {

prime_nums=c(prime_nums,i) x = c(x[(x %% i) != 0], i)

}

return(prime_nums)

}

else

{

stop("Input number should be at least 2.")

}

prime_numbers(12)

Output:

6. The built-in data set mammals contain data on body weight versus brain weight.

Develop R commands to:

a) Find the Pearson and Spearman correlation coefficients. Are they similar?

b) Plot the data using the plot command.

c) Plot the logarithm (log) of each variable and see if that makes a difference.

Program:

setwd("F:/BIT") #Change Directory

my_data <- read.csv("mammals.csv")

# Part a: Find the Pearson and Spearman correlation coefficients. Are they similar?

data <- read.csv("mammals.csv")

print(data)

pearson_corr <- cor(mammals$brainwt, mammals$bodywt, method = "pearson")

spearman_corr <- cor(mammals$brainwt, mammals$bodywt, method = "spearman") print(paste("Pearson correlation coefficient:", pearson_corr))

print(paste("Spearman correlation coefficient:", spearman_corr))

# Part b: Plot the data using the plot command

plot(mammals$bodywt, mammals$brainwt, xlab = "Body Weight", ylab = "Brain Weight", main = "Body Weight vs. Brain Weight")

# Part c: Plot the logarithm (log) of each variable and see if that makes a difference

plot(log(mammals$bodywt), log(mammals$brainwt), xlab = "log(Body Weight)", ylab

= "log(Brain Weight)", main = "log(Body Weight) vs. log(Brain Weight)")

Output:

Let us use the built-in dataset air quality which has Daily air quality measurements in New York, May to September 1973. Develop R program to generate histogram by using appropriate arguments for the following statements.

a) Assigning names, using the air quality data set.

b) Change colors of the Histogram

c) Remove Axis and Add labels to Histogram

d) Change Axis limits of a Histogram

e) Add Density curve to the histogram

Code:

# Load the dataset

data(airquality)

# a) Assigning names, using the air quality data set

names(airquality) <- c("Ozone", "Solar.R", "Wind", "Temp", "Month", "Day")

# b) Change colors of the Histogram

hist(airquality$Ozone, col = "skyblue", main = "Histogram of Ozone Levels", xlab = "Ozone Levels")

# c) Remove Axis and Add labels to Histogram

hist(airquality$Wind, col = "lightgreen", main = "", xlab = "", ylab = "", axes = FALSE)

axis(1, at = seq(0, max(airquality$Wind), by = 5), labels = seq(0, max(airquality$Wind), by = 5))

axis(2)

title(main = "Histogram of Wind Speed", xlab = "Wind Speed", ylab = "Frequency")

# d) Change Axis limits of a Histogram

hist(airquality$Temp, col = "salmon", main = "Histogram of Temperature", xlim = c(50, 100), ylim = c(0, 30))

# e) Add Density curve to the histogram

hist(airquality$Solar.R, col = "lightblue", main = "Histogram of Solar Radiation", xlab = "Solar Radiation")

lines(density(airquality$Solar.R), col = "red")

9. Design a data frame in R for storing about 20 employee details. Create a CSV file named “input.csv” that defines all the required information about the employee such as id, name, salary, start_date, dept. Import into R and do the following analysis.

a) Find the total number rows & columns

b) Find the maximum salary

c) Retrieve the details of the employee with maximum salary

d) Retrieve all the employees working in the IT Department.

e) Retrieve the employees in the IT Department whose salary is greater than 20000 and write these details into another file “output.csv”

Program:

setwd("F:/BIT")

my_data <- read.csv("input.csv")

data <- read.csv("input.csv")

print(data)

data <- read.csv("input.csv") print(is.data.frame(data))

print(ncol(data))

print(nrow(data))

data <- read.csv("input.csv")

sal <- max(data$salary)

print(sal)

data <- read.csv("input.csv")

sal <- max(data$salary)

retval <- subset(data, salary == max(salary)) print(retval)

data <- read.csv("input.csv")

retval <- subset( data, dept == "IT")

print(retval)

data <- read.csv("input.csv")

info <- subset(data, salary > 600 & dept == "IT") print(info)

data <- read.csv("input.csv")

retval <- subset(data, as.Date(start_date) > as.Date("2014-01-01"))

write.csv(retval,"output.csv", row.names = FALSE)

newdata <- read.csv("output.csv")

print(newdata)

Output:

10. Using the built in dataset mtcars which is a popular dataset consisting of the design and fuel consumption patterns of 32 different automobiles. The data was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973-74 models). Format A data frame with 32 observations on 11 variables :

a) mpg Miles/(US) gallon.

b) cyl Number of cylinders.

c) disp Displacement (cu.in.).

d) hp Gross horsepower.

e) drat Rear axle ratio,[6] wt Weight (lb/1000).

f) qsec 1/4 mile time.

g) vs V/S.

h) am Transmission (0 = automatic, 1 = manual).

i) gear Number of forward gears,.

j) carb Number of carburetors.

Develop R program, to solve the following:

a) What is the total number of observations and variables in the dataset?

b) Find the car with the largest hp and the least hp using suitable functions.

c) Plot histogram / density for each variable and determine whether continuous variables are normally distributed or not. If not, what is their skewness?

d) What is the average difference of gross horse power(hp) between automobiles with 3 and 4 number of cylinders(cyl)? Also determine the difference in their standard deviations.

Program:

install.packages("dplyr")

install.packages("explore")

library(dplyr)

library(explore)

mtcars %>% explore_tbl()

mtcars %>% describe()

mtcars %>%

explore_all()

mtcars %>%

explore(gear)

mtcars %>%

select(gear, mpg, hp, cyl, am) %>%

explore_all(target = gear)

data <- mtcars %>%

mutate(highmpg = if_else(mpg > 25, 1, 0, 0)) %>%

select(-mpg)

data %>% explore(highmpg)

data %>%

select(highmpg, cyl, disp, hp) %>%

explore_all(target = highmpg)

data %>%

select(highmpg, drat, wt, qsec, vs) %>%

explore_all(target = highmpg)

data %>%

select(highmpg, am, gear, carb) %>%

explore_all(target = highmpg)

data %>%

explain_tree(target = highmpg)

data %>% explore(wt, target = highmpg)

data %>% explore(wt, target = highmpg, split = FALSE)

mtcars %>% explore(wt, mpg)

mtcars %>%

explain_tree(target = hp, minsplit=15)

mtcars %>%

select(hp, cyl, mpg) %>% explore_all(target = hp)

Output:

11. Demonstrate the progression of salary with years of experience using a suitable data set (You can create your own dataset). Plot the graph visualizing the best fit line on the plot of the given data points. Plot acurve of Actual Values vs. Predicted values to show their correlation and performance of the model. Interpret the meaning of the slope and y-intercept of the line with respect to the given data. Implement using lm function. Save the graphs and coefficients in files. Attach the predicted values of salaries as a new column to the original data set and save the data as a new CSV file.

Code:

# Step 1: Create a dataset

years_of_experience <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)

salaries <- c(30000, 35000, 40000, 45000, 50000, 55000, 60000, 65000, 70000, 75000) data <- data.frame(Experience = years_of_experience, Salary = salaries)

# Step 2: Plot the data points

plot(data$Experience, data$Salary, main = "Salary vs. Years of Experience", xlab = "Years of Experience", ylab = "Salary", pch = 16, col = "blue")

# Step 3: Fit a linear regression model model <- lm(Salary ~ Experience, data = data)

# Step 4: Add the best fit line to the plot abline(model, col = "red")

# Step 5: Predict values using the model predicted_values <- predict(model)

# Step 6: Plot actual vs predicted values

plot(data$Salary, predicted_values, main = "Actual vs. Predicted Values", xlab = "Actual Salary", ylab = "Predicted Salary", col = "blue", pch = 16) abline(0, 1, col = "red") # Add a diagonal line for reference

# Step 7: Interpret coefficients slope <- coef(model)[2] intercept <- coef(model)[1] cat("Slope:", slope, "\n") cat("Y-intercept:", intercept, "\n")

# Step 8: Save the graphs png("Salary_vs_Experience.png")

plot(data$Experience, data$Salary, main = "Salary vs. Years of Experience", xlab = "Years of Experience", ylab = "Salary", pch = 16, col = "blue") abline(model, col = "red") dev.off()

png("Actual_vs_Predicted.png")

plot(data$Salary, predicted_values, main = "Actual vs. Predicted Values", xlab = "Actual Salary", ylab = "Predicted Salary", col = "blue", pch = 16) abline(0, 1, col = "red") dev.off()

# Step 9: Attach predicted values as a new column to the original dataset data$Predicted_Salary <- predicted_values

# Step 10: Save the dataset as a new CSV file

write.csv(data, file = "new_dataset.csv", row.names = FALSE)

print(new_dataset)

Output: