手机版

主页 > 文库下载 > 教育文库 > 内容

R语言主成分分析案例附代码数据

时间：2025-07-05 来源：未知

小中大

字号：

【原创】附代码数据

有问题到淘宝找“大数据部落”就可以了

R语言主成分分析案例

Question1

Q1.1:

> print(eigen_values)

[1] 2.4802416 0.9897652 0.3565632 0.1734301

Q1.2

> print(eigen_vectors)

[,1] [,2] [,3] [,4]

[1,] -0.5358995 0.4181809 -0.3412327 0.64922780

[2,] -0.5831836 0.1879856 -0.2681484 -0.74340748

[3,] -0.2781909 -0.8728062 -0.3780158 0.13387773

[4,] -0.5434321 -0.1673186 0.8177779 0.08902432

Q1.3

> print('variance for each eigen_values')

[1] "variance for each eigen_values"

> print(scores)

Comp.1 Comp.2 Comp.3 Comp.4

0.9655342206 0.027******* 0.0057995349 0.0008489079

Question2:

Q2.1:

See in code

Q2.2:

The result of ordinary linear regression:

> OLS

Call:

lm(formula = Apps ~ ., data = collegeTrainData)

Coefficients:

(Intercept) Private Accept Enroll Top10perc Top25perc F.Undergrad

-8.753e+02 -6.409e+02 1.345e+00 -2.841e-01 4.792e+01 -1.465e+01 1.980e-02

P.Undergrad Outstate Room.Board Books Personal PhD Terminal

-1.612e-03 -4.370e-02 2.831e-01 2.356e-01 8.284e-02 1.552e-01 -9.877e+00

S.F.Ratio perc.alumni Expend Grad.Rate

1.547e+01 -6.582e+00 6.118e-02 4.944e+00

【原创】附代码数据

有问题到淘宝找“大数据部落”就可以了

And the result in terms of MSE and r-squared is;

> print(mse)

[1] 1454941

> print(rsqured)

[1] 0.9162122

Q2.3:

Use the lambda of seq(0, 1, 0.05) in r, which means from 0 to 1 by 0.05,

The result by ridge regression of cross validation is:

> print(mse)

[1] 1464329

> print(ridgeRsquared)

[1] 0.9156716

Which is slightly worse than the ordinary linear regression.

Q2.3:

Use the lambda of seq(0, 1, 0.05) in r, which means from 0 to 1 by 0.05,

The result by lasso regression of cross validation is:

> mse

[1] 1471047

> LassoRsquared

[1] 0.9152847

And I make the following table to compare the parameters by the three different models:

It can found that Lasso set the parameter of “Phd” to 0. Then it can be inferred that the adjusted r-square of Lasso regression is the best among the three models.

【原创】附代码数据

有问题到淘宝找“大数据部落”就可以了

Question3:

Q3.1:

> h_1 = sd(F12)*(4/3/length(F12))^(1/5)

> h_1

[1] 0.3101212

Q3.2:

> min(F12)

[1] -2.995732

> max(F12)

[1] 7.930889

The min value of log_F12 is -2.99, the maximum value is 7.93. Therefore, I choose the sample from -3 to 8 by 0.05, the following is the plot of the estimated density.

Q3.3:

I choose 4 different bandwidth:

h_2 <- 0.1

h_3 <- 0.2

h_4 <- 0.5

h_5 <- 0.7

And the following plot can be get:

【原创】附代码数据

有问题到淘宝找“大数据部落”就可以了

The middle one is the plot by question b.

And the numerical summary of the simulated density for the five different bandwidth

We can see that the larger bandwidth will cause a evener gentler distribution.

…… 此处隐藏：515字，全部文档内容请下载后查看。喜欢就下载吧 ……

R语言主成分分析案例附代码数据.doc 将本文的Word文档下载到电脑，方便复制、编辑、收藏和打印

下载这篇word文档

上一篇：臻美阁_微电影策划案

下一篇：《量柱擒涨停》之精华珍藏版

×

二维码

相

关

文

章

分类导航

幼儿教育小学教育初中教育教学研究专业资料资格考试教育文库外语考试高等教育求职职场高中教育实用文档

R语言主成分分析案例附代码数据

分类导航

今日头条

每日精选

猜你喜欢

精彩图片

热门标签

R语言主成分分析案例 附代码数据

推荐阅读

分类导航

今日头条

每日精选

猜你喜欢

精彩图片

热门标签

R语言主成分分析案例附代码数据