计算机专业英语论文文档格式.docx
- 文档编号:7781295
- 上传时间:2023-05-09
- 格式:DOCX
- 页数:11
- 大小:68.39KB
计算机专业英语论文文档格式.docx
《计算机专业英语论文文档格式.docx》由会员分享,可在线阅读,更多相关《计算机专业英语论文文档格式.docx(11页珍藏版)》请在冰点文库上搜索。
TheRelationshipandDistinction
BetweenBigDataandDataMining
任课教师:
************
专业名称:
计算机技术
所属学院:
计算机科学与工程学院
桂林电子科技大学研究生院
**年*月*日
TheRelationshipandDistinctionBetweenBigDataandDataMining
StudentID:
*
Name:
Adviser:
GuilinUniversityofElectronicTechnology
**,*
Abstract:
Inthispaper,dataminingisdiscussedinthecontextofbigdata.Firstly,weelaboratethefactthatbigdataplaysaprimaryroleinattractingacademiccommunity,businessindustryandgovernments.Secondly,theadverseofbigdataisdiscussed,suchasmuchgarbage,heavypollutionanditsdifficultiesinutilization.Finally,wedissectthevalueinbigdata,expoundthetechniquestodiscoverknowledgefrombigdata,andinvestigatethetransformationfromknowledgeintodataintelligences.
Keywords:
bigdata;
datamining;
dataintelligence
1.Introduction
Asdatavolumescontinuetoincreaseexponentially,thedatatsunamicaneasilyoverwhelmtraditionalanalyticstoolsorplatformsdesignedtoingest,analyzeandreport.
Everyday,2.5quintillionbytesofdataarecreatedand90percentofthedataintheworldtodaywereproducedwithinthepasttwoyears[1].Thechallengewearefacingisnotonlyhowtostoreandmanagediversedatabutalsotoeffectivelyanalyzethedatatogaininsightknowledgetomakesmarterdecisions.
Currently,anumberofworkshavebeenpresented.
Theseresearchesintroducebigdata,miningandanalyzingfromdifferentaspects,suchasstatusquo,ideasorimplementations.
Forexample:
introducesthe“LambdaArchitecture”whichprovidesageneralpurposeapproachtoimplementarbitraryfunctionsonmassivedatasetinrealtime;
ascalabledeepanalyticsplatformhasbeenimplemented.Becauseofthecomplexity,thereisnosingletoolorone-size-fits-allsolutionfordeeplyminingandanalyzingthebigdata.Moreover,extractingvaluableknowledgefrommassivedatasetsrequiresfurtherstudies,experimentsaswellasscalableandsmartservices,programmingtoolsandapplicationsachieved.
Theremainderofthispaperisstructuredasfollows.
Section2
elaboratethefactthatbigdataplaysaprimaryroleineveryfields.Thentheadverseofbigdataisdiscussedinsection3.Afteranalyzingthevalueofbigdata,weintroduces
therelated
knowledgeanddevelopmentofdataminingin
section5.In
Section6,theeffectivenessofdataminingisintroduced.Finally,theconclusionfollow.
2.Aboutbigdata
Bigdataiscomplexdatasetthathasthefollowingmaincharacteristics:
Volume,Variety,VelocityandVeracity[2][3].
Thesemakeitdifficulttousetheexistingtoolstomanageandmanipulate.Inthesedata,bigdataspecificallyaccountsforthevastmajority.
Bigdataisthebasisofdataandsourceofwisdomforpeopletounderstandthereal-worldthroughtheinformationworld.
BigDataiscloselyrelatedtoapplications[4][5],andbigdataminingisitsprincipalapplication.
2.1Fromunderstandingthereal-worldtocreatingtheinformationworld
Humancivilizationisaprocessfromunderstandingthereal-worldtocreatingtheinformationworld,whichhasgonethroughthefollowingstages:
preliminarysensingtheworld,helpingmemorybyinformation,recordedandinheritedbyinformation,exchangeandcommunicationbyinformationandunderstandingtheworldonceagainbyinformation.Initially,Humantakeadvantageofstonesandshellstocountaccordingtotheprincipleofone-to-one.AndtheytieknotsNotetohelpmemory.Later,Humanusesimplegraphics,drawnotes,andinheritmoreaccuratememorythroughtheirownemotionalprompted.Whenthegraphicsbecomebodyrelativelyfixedcommonsymbol,andassociatewiththewordsinthelanguage,itproducestexts.Textsabstractandgeneralizetheworld,promoteculturalunderstanding,andpreparethenecessaryfoundationforthedevelopmentofscience.Aimedatbreakingthroughtherestrictionswhichthewrittensymbolsdependonartificialcopyingorengraving,Humanusemachinesafterindustrialrevolutiontovolumemechanizedproduction,whichimprovestheefficiencyoftheculturaltransmission.Computercentershigh-speedcomputing,andspinsoffthesoftwarefromthehardware,contributingtothedisseminationofinformation“electronically”and“automatically”.Internetcentersnetwork,interrelatescomputers,breakinglocalinformationrestriction.Mobilecommunicationcentersusers,makingthemachinefollowsuser'
smovementsandunboundshumanfromthemachine.InternetofThingscentersapplications,automaticallyidentifiesobjects,toenabletheinformationsharingbetweenthehumanandthings.Cloudcomputingcentersservicebyconsolidatingexpertiseandoptimizingtheallocationofresources.
Bigdatacentersdata,andminesknowledgeintheentiredata,breakingthesamplingrandomnessofthesample[6][7],anddemonstratingonbigdatacenterandmobileterminal.
Theseinformationtechnologiesservefortheunderstandingandtransformingoftherealworld.
2.2Bigdataisattractingmuchattention
Ashumansexploretherealworldthroughscientificresearch,humansunravelthemysteriesintheinformationworldthroughbigdataanddatamining,whichareattractingmuchattentionfromacademia.InMay2011,McKinseypublished“Bigdata:
thenextfrontierforinnovation,competition,andproductivity”,analyzedapplicationpotentialofbigdataindifferentindustriesfromtheeconomicandcommercialdimensions,spelledoutthedevelopmentpolicyfortheGovernmentandindustrydecisionmakersdealingwithbigdata.
InJanuary2012,the“WallStreetJournal”arguedthatbigdata,smartproductionandwirelessnetworktwillleadtoneweconomicprosperity[8].
InMarch2012,theUnitedStatesgovernmentreleased“BigDataResearchandDevelopmentInitiative”,whichrosesthedevelopmentandapplicationofbigdatafrombusinessconducttonationaldeploymentstrategicinordertoimprovetheabilitytoextractknowledgefromlargeandcomplexdata,tohelpsolvesomeofthenation'
smostpressingchallenges.
InApril2012,“NatureBiotechnology”invitedeightbiologiststoevaluateanarticlewhichpublishedinDecember2011on“Science”titling“DetectingNovelAssociationsinLargeDataSets”inapapertitled“Findingcorrelationsinbigdata”.
InJuly2012,Gartnerreleasedthefirstdatasurveyreport“HypeCycleforBigData,2012”,whichthoughtdeeplyinbigdata[9].
InChina[10],bigdataattractsasmuchattentionasitdoesaroundtheworld.BaiduusesHadooptodooff-lineprocessingsince2007.Currently,Baiduhasover10,000Hadoopservers,whichismorethanYahooandFacebook,anditplanstoreach20,000in2013.Intheseservers,80%Hadoopclustersareprocessing0totalof6TBdataeverydayonloganalysis.Tencent,TaobaoandAlipayarealsousingHadooptoestablishdatawarehouseandhandlebigdata.InApril2010,Taobaolaunchedadataminingplatform“datacube”,basedonanonehundredbillionleveldatabasenamedOceanBase,whichsupportsfor4to5milliontimesupdateoperation,includingover2billionrecords,containingmorethan2.5TBdatainoneday.InMay2010,ChinaMobileestablishedamassivedistributedsystemsandstructuredmassdatamanagementsystemonthecloud.Huaweianalyzesdatabasedonmobileterminalsandstoragemassivedatathroughthecloudtoobtainvaluableinformation.Alibabaanalyzesbusinesstransactiondatathroughbigdatatechnologytodocreditapproval.
3.Bigdatadisaster
Bigdataiscloselyrelatedtohumandailylife,permeatedallwalksoflife.Thenumber,sizeandcomplexityareallinsharpincreasing.
Alargeamountofdatahasbeenstoredinthedatabaseandwarehouseintypesoftext,graphics,imagesandmultimedia[11].
TheresearchfromInternationalDataCorporation
hasshownthat,asof2003humanshavecreatedatotalof5EBdata,whileintheyearof2011,theamountofdatathathadbeencopiedandproducedisexceeded1.8ZB.Itisexpectedthatby2020globaldatausagewillreach35.2ZB,whichneeds37.6billionharddrivesof1TBcapacitytostore.Ontheonehandthesedatabroadensthescopeofavailablebigdataavailableforhumantogainwisdom.Ontheotherhandthevalueofasingleunitofthedataisrapidlydeclining.Humanissubmergedbythedataoceanbutthirstyforknowledge.
3.1Garbage
Bigdataisvoluminousanditgrowsquickly,butithasverylowdensityinvalue,whichmeansthereisalotofjunkdata[12].Thestudyontheelectron-positroncolliderhasbeenabletoshoot40millionpicturespersecond,butonlyafewthousandsareuseful.RomaniaInternetsecuritycompanyBitDefenderpointedoutthat
spamandfishinginformationinthesocialnetworkgamehasincreasedbymorethan50%.Comparedtootheronlinecommunicationenvironment,socialnetworkusersaremoreeasilytounknowinglyacceptandloadgarbageinformation.
Bigdataandapplicationsarecloselyrelated,andprofessionallabelingofthedataisthebasicobjectiveofrationalanalysisandsoundjudgment.
Whetherscientificexperimentaldataorobservationdataneedtobelabeledbyexpertsinthefield.
AccordingtotheIDCstatistics,in2012only23%ofallinformationisuseful,ofwhichonly3%ofpotentiallyusefulinformationhadbeenlabeled,andtheproportionofdatawhichhadbeenanalyzedismuchless.Withthedevelopmentofmodernmeasuringtechniqueanddigitalrecordingmethod,inthefaceofhugeinformation,traditional,artificial,experienceeliminationandanalysismethodshavebecomepowerless.
3.2Contamination
Datacollectedfromtherealworldiscontaminated.Moreover,ase
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- 计算机专业 英语论文