r语言回归自测习题附代码答案文档格式.docx
- 文档编号:8139636
- 上传时间:2023-05-10
- 格式:DOCX
- 页数:10
- 大小:39KB
r语言回归自测习题附代码答案文档格式.docx
《r语言回归自测习题附代码答案文档格式.docx》由会员分享,可在线阅读,更多相关《r语言回归自测习题附代码答案文档格式.docx(10页珍藏版)》请在冰点文库上搜索。
#因为线性回归模型的一个观测点异常时,会对自变量和因变量的平均值产生很大影响,会对beta产生很大的影响,模型会发生巨大的改变
#标准化残差值大于2或者小于2的点可能是离群点
##3.(1pt)
##Howcouldyoudealwithoutliersinordertoimprovetheaccuracyofyourmodel?
#对离群点进行删除或者用均值来替代
##################Part2:
SamplingandPointEstimation#####################
##Thefollowingproblemswillusethecatsdatasetandexplore
##theaveragebodyweightoffemalecats.
##Loadthedatabyrunningthefollowingcode
#install.packages("
MASS"
library(MASS)
##Warning:
package'
MASS'
wasbuiltunderRversion3.3.3
data(cats)
##4.(2pts)
##SubsetthedataframetoONLYincludefemalecats.
cats=cats[cats$Sex=="
F"
]
##Usethesamplefunctiontogenerateavectorof1sand2sthatisthesame
##lengthasthesubsetteddataframeyoujustcreated.Usethisvectortosplit
##the'
Bwt'
variableintotwovectors,Bwt1andBwt2.
##IMPORTANT:
Makesuretorunthefollowingseedfunctionbeforeyourunyoursample
##function.Runthembacktobackeachtimeyouwanttorunthesamplefunctiontoensure
##thesameseedisusedeverytime.
##Check:
Ifyoudidthisproperly,youwillhave24elementsinBwt1and23elements
##inBwt2.
set.seed(676)
s1=sample(length(cats$Bwt),24)
Bwt1=cats$Bwt[sample(length(cats$Bwt),24)]
Bwt2=cats$Bwt[-s1]
##5.(3pts)
##Calculatethemeanandthestandarddeviationforeachofthetwo
##vectors,Bwt1andBwt2.Usethisinformationtocreatea95%
##confidenceintervalforyoursamplemeans(youcanusethefollowingformula
##foraconfidenceinterval:
mean+/-2*standarddeviation).
##Comparetheconfidenceintervals--dotheyseemtoagreeordisagree?
mean(Bwt1)
##[1]2.3375
mean(Bwt2)
##[1]2.395652
sd(Bwt1)
##[1]0.2617873
sd(Bwt2)
##[1]0.2754802
#confidenceinterval
mean(Bwt1)+2*sd(Bwt1)
##[1]2.861075
mean(Bwt1)-2*sd(Bwt1)
##[1]1.813925
mean(Bwt2)+2*sd(Bwt2)
##[1]2.946613
mean(Bwt2)-2*sd(Bwt2)
##[1]1.844692
#从置信区间来看,他们相差不大,结果类似。
##6.
##Draw1000observationsfromastandardnormaldistribution.Calculatethesamplemean.
##Repeatthis500times,storingeachsamplemeaninavectorcalledmean_dist.
##Plotahistogramofmean_disttodisplaythedistributionofyoursamplemean.
##Howcloselydoesyourhistogramresemblethisnormaldistribution?
Explain.
mean_dist=0
for(iin1:
1000){
x=rnorm(1000)
mean_dist[i]=mean(x)
}
hist(mean_dist)
#从结果来看,均值直方图符合正态分布。
##7.(3pts)
##WriteafunctionthatimplementsQ5.
HW.Bootstrap=function(distn,n,reps){
set.seed(666)
###Youranswerhere
mean_dist=0
if(distn=="
rexp"
){
for(iin1:
reps){
x<
-rexp(n,1)
}
##UsethefunctionyouwritetorepeattheexperimentinQ5butinsteadofthe
##normaldistributionasweusedabove,useanexponentialdistributionwithmean1.
##Checkyourhistogramandwriteoutyourfindings.
HW.Bootstrap(rexp,n,reps))
HW.Bootstrap(distn="
n=1000,reps=1000)
#从结果来看,指数分布的均值直方图形状符合正态分布。
###################Part3:
MoreLinearRegression######################
##ThisproblemwillusethePrestigedataset.
##Loadthedatabyrunningcodebelow
car"
library(car)
car'
data(Prestige)
head(Prestige)
##educationincomewomenprestigecensustype
##gov.administrators13.111235111.1668.81113prof
##general.managers12.26258794.0269.11130prof
##accountants12.77927115.7063.41171prof
##purchasing.officers11.4288659.1156.81175prof
##chemists14.62840311.6873.52111prof
##physicists15.64110305.1377.62113prof
##Wewillfocusonthistwovariables:
##income:
Averageincomeofincumbents,dollars,in1971.
##education:
Averageeducationofoccupationalincumbents,years,in1971
##Beforestartingthisproblem,wewilldeclareanullhypthosesisthat
##educationhasnoeffectonincome.
##Thatis:
H0:
B1=0
##HA:
B1!
=0
##Wewillattempttorejectthishypothesisbyusingalinearregression
##8.(2pt)
##FitalinearregressionusingofPrestigedatausingeducationtopredict
##income,usinglm().Examinethemodeldiagnosticsusingplot().Wouldyou
##considerthisagoodmodelornot?
mm<
-lm(income~.,data=Prestige)
plot(mm)
#从图中看,可以发现有异常点出现,qq图没有分布在红线周围,残差不符合正态分布,因此模型拟合效果一般。
##9.(2pts)
##Usingtheinformationfromsummary()onyourmodel(theoutputfromthelm()command),createa
##95%confidenceintervalforthecoefficientofeducationvariable
summary(mm)
##
##Call:
##lm(formula=income~.,data=Prestige)
##Residuals:
##Min1QMedian3QMax
##-7752.4-954.6-331.2742.614301.3
##Coefficients:
##EstimateStd.ErrortvaluePr(>
|t|)
##(Intercept)7.320533037.270480.0020.99808
##education131.18372288.749610.4540.65068
##women-53.234809.83107-5.4154.96e-07***
##prestige139.2091236.402393.8240.00024***
##census0.042090.235680.1790.85865
##typeprof509.151501798.879140.2830.77779
##typewc347.990101173.893840.2960.76757
##---
##Signif.codes:
0'
***'
0.001'
**'
0.01'
*'
0.05'
.'
0.1'
'
1
##Residualstandarderror:
2633on91degreesoffreedom
##(4observationsdeletedduetomissingness)
##MultipleR-squared:
0.6363,AdjustedR-squared:
0.6123
##F-statistic:
26.54on6and91DF,p-value:
<
2.2e-16
#95%confidenceinterval
confint.lm(mm)
##2.5%97.5%
##(Intercept)-6025.84416066040.4852295
##education-442.3818984704.7493459
##women-72.7630052-33.7065943
##prestige66.9002455211.5179870
##census-0.42605090.5102307
##typeprof-3064.10093704082.4039336
##typewc-1983.80579892679.7860021
##10.(2pts)
##Basedontheresultfromquestion9,wouldyourejectthenullhypothesisornot?
##(Assumeasignificancelevelof0.05).Explain.
#Coefficients:
#EstimateStd.ErrortvaluePr(>
#(Intercept)7.320533037.270480.0020.99808
#education131.18372288.749610.4540.65068
#从结果来看,education的p值大于0.05,因此可以认为教育对收入没有显著的影响。
##11.(1pt)
##Assumingthatthenullhypothesisistrue.
##Basedonyourdecisioninthepreviousquestion,wouldyoubecommittingadecisionerror?
##Ifso,whichtypeoferror?
#而类型II错误不正确地保留假虚假假设(“假阴性”)。
##12.(1pt)
##Discusswhatyourregressionresultsmeaninthecontextofthedata.
ThinkbacktoQuestion1)
#从结果来看,可以发现性别声望对收入有显著的影响,同时可以发现声望越大,收入越高,而性别为女性,则收入会降低。
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- 语言 回归 自测 习题 代码 答案