您好,欢迎访问三七文档
当前位置:首页 > 金融/证券 > 金融资料 > 数据挖掘和分析16_W12B
广义线性模型2014.11.14第十二周数据挖掘和分析Content01020304广义线性模型预测类别性变量:Logistic回归计数型模型建模:泊松回归05广义线性模型广义线性模型线性模型:因变量正态分布因变量:类别型计数型(方差和均值相关)广义线性模型•标准线性模型:•广义线性模型:01pYjjjX01()pYjjjgX连接函数:条件均值的函数广义线性模型:glm()•glm(formla,family=family(link=function),data=)•表13-1glm()的参数–分布族+默认的连接函数–如:glm(Y~X1+X2+X3,family=gaussian(link=“identity”),data=mydata)•与下列函数连用:–summary(),coefficeints(),coef(),confint(),–residuals(),anova(),plot(),predict()广义线性模型:模型拟合和诊断•评价模型适用性:–?–响应变量的预测值与残差图•plot(predict(model,type=response),residual(model,type=deviance))•帽子值、学生化残差值、Cook距离统计量...•综合性诊断图(car::influencePlot)广义线性模型:Logistic回归•适用于二值型响应变量(0,1)•glm(Y~X1+X2+X3,family=binomial(link=“logit”),data=mydata)01ln()1pjjjX01()pYjjjgX优势比Y的条件均值给定X是Y=1的概率广义线性模型:Logistic回归•通过一系列连续和/或类别型变量预测二值型结果变量(例:婚外情数据Fair'sAffair)•犯错的频率~性别+年龄+婚龄+是否有小孩+宗教+学历+职业+婚姻的自我评分data(Affairs,package=AER)str(Affairs)'data.frame':601obs.of9variables:$affairs:num0000000000...$gender:Factorw/2levelsfemale,male:2112211212...$age:num37273257223222573222...$yearsmarried:num10415150.751.50.7515151.5...$children:Factorw/2levelsno,yes:1122111221...$religiousness:int3415222244...$education:num18141218171712141614...$occupation:int7616651414...$rating:int4445353425...广义线性模型:Logistic回归summary(Affairs)affairsgenderageyearsmarriedchildrenMin.:0.000female:315Min.:17.50Min.:0.125no:1711stQu.:0.000male:2861stQu.:27.001stQu.:4.000yes:430Median:0.000Median:32.00Median:7.000Mean:1.456Mean:32.49Mean:8.1783rdQu.:0.0003rdQu.:37.003rdQu.:15.000Max.:12.000Max.:57.00Max.:15.000religiousnesseducationoccupationratingMin.:1.000Min.:9.00Min.:1.000Min.:1.0001stQu.:2.0001stQu.:14.001stQu.:3.0001stQu.:3.000Median:3.000Median:16.00Median:5.000Median:4.000Mean:3.116Mean:16.17Mean:4.195Mean:3.9323rdQu.:4.0003rdQu.:18.003rdQu.:6.0003rdQu.:5.000Max.:5.000Max.:20.00Max.:7.000Max.:5.000table(Affairs$affairs)01237124513417194238广义线性模型:Logistic回归Affairs$ynaffair[Affairs$affairs0]-1Affairs$ynaffair[Affairs$affairs==0]-0Affairs$ynaffair-factor(Affairs$ynaffair,levels=c(0,1),labels=c(No,Yes))table(Affairs$ynaffair)NoYes451150广义线性模型:Logistic回归fit.full-glm(ynaffair~gender+age+yearsmarried++children+religiousness+education+occupation+rating,+data=Affairs,family=binomial())summary(fit.full)Coefficients:EstimateStd.ErrorzvaluePr(|z|)(Intercept)1.377260.887761.5510.120807gendermale0.280290.239091.1720.241083age-0.044260.01825-2.4250.015301*yearsmarried0.094770.032212.9420.003262**childrenyes0.397670.291511.3640.172508religiousness-0.324720.08975-3.6180.000297***education0.021050.050510.4170.676851occupation0.030920.071780.4310.666630rating-0.468450.09091-5.1532.56e-07***广义线性模型:Logistic回归fitfullmodelfit.full-glm(ynaffair~gender+age+yearsmarried++children+religiousness+education+occupation+rating,+data=Affairs,family=binomial())summary(fit.full)(部分显示)Coefficients:EstimateStd.ErrorzvaluePr(|z|)(Intercept)1.377260.887761.5510.120807gendermale0.280290.239091.1720.241083age-0.044260.01825-2.4250.015301*yearsmarried0.094770.032212.9420.003262**childrenyes0.397670.291511.3640.172508religiousness-0.324720.08975-3.6180.000297***education0.021050.050510.4170.676851occupation0.030920.071780.4310.666630rating-0.468450.09091-5.1532.56e-07***广义线性模型:Logistic回归fitreducedfit.reduced-glm(ynaffair~age+yearsmarried++religiousness+rating,data=Affairs,family=binomial())summary(fit.reduced)(部分显示)EstimateStd.ErrorzvaluePr(|z|)(Intercept)1.930830.610323.1640.001558**age-0.035270.01736-2.0320.042127*yearsmarried0.100620.029213.4450.000571***religiousness-0.329020.08945-3.6780.000235***rating-0.461360.08884-5.1932.06e-07***广义线性模型:Logistic回归比较full和reducedanova(fit.reduced,fit.full,test=Chisq)AnalysisofDevianceTableModel1:ynaffair~age+yearsmarried+religiousness+ratingModel2:ynaffair~gender+age+yearsmarried+children+religiousness+education+occupation+ratingResid.DfResid.DevDfDeviancePr(Chi)1596615.362592609.5145.84740.2108广义线性模型:Logistic回归解释模型参数•由连接函数:响应变量--优势比•固定其他变量某预测变量单位变化引起的优势比变化--Y=1的概率coef(fit.reduced)(Intercept)ageyearsmarriedreligiousnessrating1.93083017-0.035271120.10062274-0.32902386-0.46136144exp(coef(fit.reduced))(Intercept)ageyearsmarriedreligiousnessrating6.89523210.96534371.10585940.71962580.6304248广义线性模型:Logistic回归模型预测testdata-data.frame(rating=c(1,2,3,4,5),+age=mean(Affairs$age),yearsmarried=mean(Affairs$yearsmarried),+religiousness=mean(Affairs$religiousness))testdata$prob-predict(fit.reduced,newdata=testdata,+type=response)testdataratingageyearsmarriedreligiousnessprob1132.487528.1776963.1164730.53022962232.487528.1776963.1164730.41573773332.487528.1776963.1164730.30967124432.487528.1776963.1164730.22045475532.487528.1776963.1164730.1513079广义线性模型:Logistic回归模型预测testdata-data.frame(rating=c(1,2,3,4,5),+age=mean(Affairs$age),yearsmarried=mean(Affairs$yearsmarried),+religiousness=mean(Affairs$religiousness))testdata$prob-predict(fit.reduced,newdata=testdata,+type=response)testdataratingageyearsmarriedreligiousnessprob1132.487528.1776963.1164730.53022962232.487528.1776963.1164730.41573773332.48
本文标题:数据挖掘和分析16_W12B
链接地址:https://www.777doc.com/doc-3800219 .html