计算机专业文献翻译机器学习的研究.docx

文档编号：14805317
上传时间：2023-06-27
格式：DOCX
页数：27
大小：41.19KB

《计算机专业文献翻译机器学习的研究.docx》由会员分享，可在线阅读，更多相关《计算机专业文献翻译机器学习的研究.docx（27页珍藏版）》请在冰点文库上搜索。

计算机专业文献翻译机器学习的研究.docx

计算机专业文献翻译机器学习的研究

Machine-LearningResearch

FourCurrentDirections

ThomasG.Dietterich

■Machine-learningresearchhasbeenmakinggreatprogressinmanydirections.Thisarticlesummarizesfourofthesedirectionsanddiscussessomecurrentopenproblems.Thefourdirectionsare

（1）theimprovementofclassificationaccuracybylearningensemblesofclassifiers,

（2）methodsforscalingupsupervisedlearningalgorithms,（3）reinforcementlearning,and（4）thelearningofcomplexstochasticmodels.

Thelastfiveyearshaveseenanexplosioninmachine-learningresearch.Thisexplosionhasmanycauses:

First,separateresearchcommunitiesinsymbolicmachinelearning,computationlearningtheory,neuralnetworks,statistics,andpatternrecognitionhavediscoveredoneanotherandbeguntoworktogether.Second,machine-learningtechniquesarebeingappliedtonewkindsofproblem,includingknowledgediscoveryindatabases,languageprocessing,robotcontrol,andcombinatorialoptimization,aswellastomoretraditionalproblemssuchasspeechrecognition,facerecognition,handwritingrecognition,medicaldataanalysis,andgameplaying.

Inthisarticle,Iselectedfourtopicswithinmachinelearningwheretherehasbeenalotofrecentactivity.ThepurposeofthearticleistodescribetheresultsintheseareastoabroaderAIaudienceandtosketchsomeoftheopenresearchproblems.Thetopicareasare

（1）ensemblesofclassifiers,

（2）methodsforscalingupsupervisedlearningalgorithms,（3）reinforcementlearning,and（4）thelearningofcomplexstochasticmodels.

Thereadershouldbecautionedthatthisarticleisnotacomprehensivereviewofeachofthesetopics.Rather,mygoalistoprovidearepresentativesampleoftheresearchineachofthesefourareas.Ineachoftheareas,therearemanyotherpapersthatdescriberelevantwork.IapologizetothoseauthorswhoseworkIwasunabletoincludeinthearticle.

EnsemblesofClassifiers

Thefirsttopicconcernsmethodsforimprovingaccuracyinsupervisedlearning.Ibeginbyintroducingsomenotation.Insupervisedlearning,alearningprogramisgiventrainingexamplesoftheform{（x1,y1）,…,（xm,ym）}forsomeunknownfunctiony=f（x）.Thexivaluesaretypicallyvectorsoftheformwhosecomponentsarediscreteorrealvalued,suchasheight,weight,color,andage.ThesearealsocalledthefeatureofXi,IusethenotationXijto.refertothejthfeatureofXi.Insomesituations,Idroptheisubscriptwhenitisimpliedbythecontext.

Theyvaluesaretypicallydrawnfromadiscretesetofclasses{1,…,k}inthecaseofclassificationorfromthereallineinthecaseofregression.Inthisarticle,Ifocusprimarilyonclassification.Thetrainingexamplesmightbecorruptedbysomerandomnoise.

GivenasetSoftrainingexamples,alearningalgorithmoutputsaclassifier.Theclassifierisahypothesisaboutthetruefunctionf.Givennewxvalues,itpredictsthecorrespondingyvalues.Idenoteclassifiersbyh1,…，hi.

Anensembleofclassifierisasetofclassifierswhoseindividualdecisionsarecombinedinsomeway（typicallybyweightedorunweightedvoting）toclassifynewexamples.Oneofthemostactiveareasofresearchinsupervisedlearninghasbeenthestudyofmethodsforconstructinggoodensemblesofclassifiers.Themaindiscoveryisthatensemblesareoftenmuchmoreaccuratethantheindividualclassifiersthatmakethemup.

Anensemblecanbeemoreaccuratethanitscomponentclassifiersonlyiftheindividualclassifiersdisagreewithoneanother（HansenandSalamon1990）.Toseewhy,imaginethatwehaveanensembleofthreeclassifiers:

{h1,h2,h3},andconsideranewcasex.Ifthethreeclassifiersareidentical,thenwhenh1（x）iswrong,h2（x）andh3（x）arealsowrong.However,iftheerrorsmadebytheclassifiersareuncorrelated,thenwhenh1（x）iswrong,h2（x）andh3（x）mightbecorrect,sothatamajorityvotecorrectlyclassifiesx.Moreprecisely,iftheerrorratesofLhypotheseshiareallequaltop

Ofcourse,iftheindividualhypothesesmakeuncorrelatederrorsatratesexceeding0.5,thentheerrorrateofthevotedensembleincreasesasaresultofthevoting.Hence,thekeytosuccessfulensemblemethodsistoconstructindividualclassifierswitherrorratesbelow0.5whoseerrorsareatleastsomewhatuncorrelated.

MethodsforConstructingEnsembles

Manymethodsforconstructingensembleshavebeendeveloped.Somemethodsaregeneral,andtheycanbeappliedtoanylearningalgorithm.Othermethodsarespecifictoparticularalgorithms.Ibeginbyreviewingthegeneraltechniques.

SubsamplingtheTrainingExamples

Thefirstmethodmanipulatesthetrainingexamplestogeneratemultiplehypotheses.Thelearningalgorithmisrunseveraltimes,eachtimewithadifferentsubsetofthetrainingexamples.Thistechniqueworksespeciallywellforunstablelearningalgorithms-algorithmswhoseoutputclassifierundergoesmajorchangesinresponsetosmallchangesinthetrainingdata.Decisiontree,neuralnetwork,andrule-learningalgorithmsareallunstable.Linear-regression,nearest-neighbor,andlinear-thresholdalgorithmsaregenerallystable.

Themoststraightforwardwayofmanipulatingthetrainingsetiscalledbagging.Oneachrun,baggingpresentsthelearningalgorithmwithatrainingsetthatconsistofasampleofmtrainingexamplesdrawnrandomlywithreplacementfromtheoriginaltrainingsetofmitems.Suchatrainingsetiscalledabootstrapreplicateoftheoriginaltrainingset,andthetechniqueiscalledbootstrapaggregation（Breiman1996a）.Eachbootstrapreplicatecontains,ontheaverage,63.2percentoftheoriginalset,withseveraltrainingexamplesappearingmultipletimes.

Anothertraining-setsamplingmethodistoconstructthetrainingsetsbyleavingoutdisjointsubsets.Then,10overlappingtrainingsetscanbedividedrandomlyinto10disjointsubsets.Then,10overlappingtrainingsetscanbeconstructedbydroppingoutadifferentisusedtoconstructtrainingsetsfortenfoldcross-validation;so,ensemblesconstructedinthiswayaresometimescalledcross-validatedcommittees（Parmanto,Munro,andDoyle1996）.

ThethirdmethodformanipulatingthetrainingsetisillustratedbytheADABOOSTalgorithm,developedbyFreundandSchapire（1996,1995）andshowninfigure2.Likebagging,ADABOOSTmanipulatesthetrainingexamplestogeneratemultiplehypotheses.ADABOOSTmaintainsaprobabilitydistributionpi（x）overthetrainingexamples.Ineachiterationi,itdrawsatrainingsetofsizembysamplingwithreplacementaccordingtotheprobabilitydistributionpi（x）.Thelearningalgorithmisthenappliedtoproduceaclassifierhi.Theerrorrate￡iofthisclassifieronthetrainingexamples（weightedaccordingtopi（x））iscomputedandusedtoadjusttheprobabilitydistributiononthetrainingexamples.（Infigure2,notethattheprobabilitydistributionisobtainedbynormalizingasetofweightswi（i）overthetrainingexamples.）

Theeffectofthechangeinweightsistoplacemoreweightonexamplesthatweremisclassifiedbyhiandlessweightonexamplesthatwerecorrectlyclassified.Insubsequentiterations,therefore,ADABOOSTconstructsprogressivelymoredifficultlearningproblems.

Thefinalclassifier,hiisconstructsbyaweightedvoteoftheindividualclassifiers.Eachclassifierisweightedaccordingtoitsaccuracyforthedistributionpithatitwastrainedon.

Inline4oftheADABOOSTalgorithm（figure2）,thebaselearningalgorithmLearniscalledwiththeprobabilitydistributionpi.IfthelearningalgorithmLearncanusethisprobabilitydistributiondirectly,thenthisproceduregenerallygivesbetterresults.Forexample,Quinlan（1996）developedaversionofthedecisiontree-learningprogramc4.5thatworkswithaweightedtrainingsample.Hisexperimentsshowedthatitworkedextremelywell.Onecanalsoimagineversionsofbackpropagationthatscaledthecomputedoutputerrorfortrainingexample（Xi,Yi）bytheweightpi（i）.Errorsforimportanttrainingexampleswouldcauselargergradient-descentstepsthanerrorsforunimportant（low-weight）examples.

However,ifthealgorithmcannotusetheprobabilitydistributionpidirectly,thenatrainingsamplecanbeconstructedbydrawingarandomsamplewithreplacementinproportiontotheprobabilitiespi.ThisproceduremakesADABOOSTmorestochastic,butexperimentshaveshownthatitisstilleffective.

Figure3comparestheperformanceofc4.5toc4.5withADABOOST.M1（usingrandomsampling）.Onepointisplottedforeachof27testdomainstakenfromtheIrvinerepositoryofmachine-learningdatabases（MerzandMurphy1996）.Wecanseethatmostpointslieabovetheliney=x,whichindicatesthattheerrorrateofADABOOSTislessthantheerrorrateofc4.5.Figure4comparestheperformanceofbagging（withc4.5）toc4.5alone.Again,weseethatbaggingproducessizablereductionsintheerrorrateofc4.5formanyproblems.Finally,figure5comparesbaggingwithboosting（bothusingc4.5astheunderlyingalgorithm）.Theresultsshowthatthetwotechniquesarecomparable,althoughboostingappearstostillhaveanadvantageoverbagging.

Wecanseethatmostpointslieabovetheliney=x,whichindicatesthattheerrorrateofADABOOSTislessthantheerrorrateofc4.5.Figure4comparestheperformanceof