书签分享收藏举报版权申诉 / 22

立即下载加入VIP,免费下载

当前位置：首页 > 自然科学 > 生物学 > Gluster架构.docx

Gluster架构.docx

文档编号：10995052
上传时间：2023-05-28
格式：DOCX
页数：22
大小：35.73KB

Gluster架构.docx

《Gluster架构.docx》由会员分享，可在线阅读，更多相关《Gluster架构.docx（22页珍藏版）》请在冰点文库上搜索。

Gluster架构.docx

Gluster架构

Gluster文件系统架构

简介

GlusterFSisascalableopensourceclusteredfilesystemthatoffersaglobalnamespace,distributedfrontend,andscalestohundredsofpetabyteswithoutdifficulty.Byleveragingcommodityhardware,Glusteralsooffersextraordinarycostadvantagesbenefitsthatareunmatchedintheindustry.GlusterFSisthecoreoftheintegratedGlusterStoragePlatformsoftwarestack.

GlusterFS是一个提供全局名字空间的开放资源的集群文件系统，前段分布，轻松横跨数百PB级数据存储的产品。

通过杠杆式的扩展硬件，Gluster还能代来非凡的无与伦比的成本优势。

GlusterFS是整合Gluster存储平台软件组合的核心。

Inthispaper,wediscussthekeyarchitecturalcomponentsofGlusterFSandhowthesecomponentsimpactproductiondeploymentsonadailybasis.

在这份资料中，我们着重讨论GlusterFS核心的架构组件和这些组件是如何每天影响我们的产品的

Attheheartofthedesignisacompletelynewviewofhowstoragearchitectureshouldbedone.Theresultisasystemthatscaleslinearly,ishighlyresilient,andoffersextraordinaryperformance.Additionally,Glusterbringscompellingeconomicsbydeployingonlowcostcommodityhardwareandscalinghorizontallyandperformanceandcapacityrequirementsgrow.TheGlusterStoragePlatformintegratesGlusterFSwithanoperatingsystemlayerandweb-basedmanagementandinstallationtool,simplifyingthetaskofdeployingpetabyte-scaleclusteredstoragetotwostepsandjustafewmouseclicks.

全新的设计核心是存储架构如何设立。

结果是造就了线性的系统，高弹性和非凡性能的产品。

另外，Gluster通过低成本的硬件设计，水平缩放以及性能和容量的增长需求带来了引人注目的经济性。

Gluster存储平台结合GlusterFS和系统层基于web管理和安装工具，只需简单几步即可轻易的部署PB级的存储集群工作。

线性增长

Storagedoesn‘tscalelinearly.Thisseemssomewhatcounter-intuitiveonthesurfacesinceitissoeasytosimplypurchaseanothersetofdiskstodoublethesizeofavailablestorage.Thecaveatindoingsoisthatthescalabilityofstoragehasmultipledimensions,capacitybeingonlyoneofthem.

存储不能线性的扩容。

这看起来就像那些表面上容易简单的通过购买磁盘来扩充存储的可用容量。

需要知道的是可扩展性存储在成倍的增长的时候容量只是其中的一个因素

Addingcapacityisonlyonedimension,thesystemsmanagingthosedisksneedtoscaleaswell.ThereneedstobeenoughCPUcapacitytodriveallofthespindlesattheirpeakcapacity,thefilesystemmustscaletosupportthetotalsize,themetadatatellingthesystemwhereallthefilesarelocatedmustscaleatthesameratedisksareaddedandthenetworkcapacityavailablemustscaletomeettheincreasednumberofclientsaccessingthosedisks.Inshort,itisnotstoragethatneedstoscaleasmuchasitisthecompletestoragesystemthatneedstoscale.Copyright2010,Gluster,Inc.Page3

增加容量只是其中的一方面，系统管理那些磁盘的时候是需要衡量的，那需要充足的CPU处理能力去计算它的峰值容量。

文件系统需要支持所有不同容量的磁盘，元数据会告知系统所有处在同一磁盘位置的文件增加了，而且网络的带宽需要能随着需要访问的客户端数量的增长成比例的增长。

这不是存储本身需要扩展而是存储系统需要。

Theproblemwiththecurrentapproachinthemarketisthatsystemsscalelogarithmically.Withlogarithmicscalability,storage‘susefulcapacitygrowsmoreslowlyasitgetslarger.Thisisduetotheincreasedoverheadnecessarytomaintaindataresiliency.Examiningtheperformanceofsomestoragenetworksreflectsthislimitationaslargerunitsoffersloweraggregateperformancethantheirsmallercounterparts.

当前市场上产品的问题是对应的系统对数的规模，伴随着对数的可扩展性，存储的可用空间增长就会像扩展它一样缓慢，这就必须让数据的增长更有弹性，检查一些存储网络的性能，需要限制大的存储单元分配部分性能做副本

Toovercomethislimitation,itisnecessarytocompletelyrevisittheunderlyingarchitecture.Anysystemthatrequiresend-to-endsynchronizationofmetadataoroffersalimitednumbersofingress/egressnetworkingportsmustbefromitsbasearchitecture.Thosesolutionsthatcannotactasaclusterofindependentstorageunitsareboundtofindascalabilitylimitationsoonerratherthanlater.

要克服这种限制，这就需要完善的重置底层架构。

任何系统要求端到端的元数据同步并从基础架构提供限制数量的网络口。

那些方案早晚都不能再担当绑定一组独立存储单元来寻求可扩展性

TheFundamentalShift

基础转换

Therearethreefundamentalchangestohowstoragemustbedoneinordertoachievetruelinearscalability:

这里有三个基本的变化来展示存储容量如何实现真实的线性增长

1.Theeliminationofmetadatasynchronizationandupdates.

去除了元数据的同步和更新

2.Effectivedistributionofdatatoachievescalabilityandreliability.

有效的分布存储数据来获得可靠性和可扩展性

3.Theuseofparallelismtomaximizeperformance.

平行扩展存储节点获得最大的性能

Theimpactofthesechangesissignificant–scalabilityisachievedwhilemaintainingresiliency.Evenbetter,awell-balancedsystemalsoresultsinimprovedperformance.

Tounderstandhowtheseimpactshappenwiththethreechangeslisted,let‘sdigintohowthesechangeswork.

这些变化带来的影响是获得重大的，可伸缩性的和简单维护。

甚至在性能提高的同时能很好的做到负载均衡。

让我们继续探究这三个变化如何工作。

Metadata–TheGlusterKeytoScalability

元数据-Gluster可扩展性的关键

OneofthemostimportantadvantagesfoundinGlusterarchitectureisitsliberationfromanydependencyonmetadata,uniqueamongallcommercialstoragemanagementsystems.Thisfundamentalshiftinarchitectureaddressesthecoreissuessurroundingmetadatainfilesystems.Copyright2010,Gluster,Inc.Page4

Gluster架构上最重要的一个优势就是它释放了元数据对独立服务器和存储管理系统的依赖性。

在文件系统中这种基础架构地址的变化围绕元数据。

Ratherthanwrestlingwithcomplexitiesofmetadatainacentralizedordistributedenvironment,theGlusterteamsimplygotridofit.

与其努力想办法克服元数据的复杂性，倒不如像Gluster团队所为，剔除它

Theresult:

Theunique‗no-metadataserver‘architectureeliminatesabottleneckresultingandprovidesincreasedscalability,reliability,andperformance.

结论：

这种独特的无元数据服务器的架构消除了性能瓶颈同时提供线性的性能增长和灵活的在线扩容以及可靠性。

Background:

WhyMetadataMatters

背景：

元数据的意义

Forfilesystemsthatdependonit,metadataistheheartandsoulofhowdataisorganized.Thosewhoareintimatelyinvolvedwithresolvingmetadataissuesareoftenalreadyfamiliarthechallengesthatcomewithscaling,especiallyinadistributedenvironment.BeforedivingintothenuancesofhowGlusterworkswithoutmetadata,let‘sstopforamomentandgetsomebackgroundonthetopic.Thosealreadyfamiliarwithhowmetadataworksmaywishtoskipthebackgroundmaterial.

文件系统依赖元数据，它也是保证一切有条不紊的核心和灵魂。

熟悉涉及元数据问题的人都要经常准备常见的元数据分离存储及差异性的挑战。

在深入研究Gluter在没有元数据服务器的时候如何工作之前，让我们先暂停一会儿先了解一下关于元数据的背景知识，它能帮助我们通过熟悉背景知识了解元数据如何工作

HowDoesMetadataImpactPerformance?

元数据如何会影响性能？

Glusterroutinelydemonstratessuperiorperformanceincustomerdeployments,evenundercircumstancescustomersexpecttoseverelylimitperformance.Forexample,asimplemeasureofperformanceinvolvessimplytiminghowlongittakestoreadorwriteasinglelargefile.Thatiscalled―sustainedsequentialaccess‖anditiswell-knowntobetheeasiestthinganyfilesystemcanbeaskedtodo.Suchbruteforcemetricstellthecustomernothingabouthowthesystemwouldperforminarealworldenvironmentwhereinothertasksarebeingsimultaneouslyhandled,orwherethefile（largeorsmall）isbeingaccessedpiecemealwithrandomoffsets.

Gluster总是能按客户的部署发挥其卓越的性能，甚至在客户希望能限制它的情况下。

比如，通过一个简单的方法来测试读或者写一个大文件的性能时，这种行为称为持续的顺序存取。

这是众所周知的大多数文件系统都很容易做到的事情。

系统会在真实的环境里面执行同时随机，逐个的打开文件无论大小而客户得不到任何提示

Themorecomplicatedtheworkload,themoreyouwouldobservemetadataalsobeingexercisedequallyorgreaterinproportiontothenumberofI/Oeventsdirectedtowardthecontentsofeachtargetedfile.Thisissueespeciallycomestotheforeinanextremelysignificantwayifthefilesystembeingtestedhasbeenreplicatedorotherwisedistributedsothatuserscansimultaneouslyaccessthesamedatasetswithcomplicatedrandomaccesspatterns,allbeingtestedinthecontextofpermissionsandotherattributescontainedinthemetadataassociatedwitheachandeveryfileanddirectoryorsubdirectory（whichbydefinitionalsomustbereplicatedanddistributed）.

更复杂的是工作量，你会更注意元数据也存在运用相同和更大的与IO数量管理的每个目标文件对应的条件。

一个非常有效的方法，如果当文件系统开始测试复制或者其它分布式存储，这样用户可以同时通过复杂的随机存取方式来访问相同的数据，这种测试的许可条件和其它属性包括与元数据有关多的每个文件和目录或者子目录，这个问题尤其突出

Alsoconsiderthatanadditionallayerofcomplexityimpliesoverlappinglayersofmixedlatencyamongtheremotefilesystemclientsbeingusedbytheend-users,suchthattheirworkloadisarrivingviahigh-performanceLANsandacrosshigh-performanceWANS（slowerthanLANs）andacrossmuch-slowernetworks（sluggishLANsornarrow-pipeWANs）inavarietyofsettingsdistributedgeographically—creatinginherentcomplexityforthedistributedmetadataupdatesaswellastheprocessesbywhichanyupdatestothedataitselfmustbecarefullyappliedsoastobeinherentlycorrect（i.e.,withchronologicalsequencepreserved）.

同样考虑到附带的复杂条件重叠混合的潜在因素在远程文件系统之上的客户端存在被最终用户使用，其工作量也达到一个高性能的LANS和交叉非常慢的网络，在各种不同的分布式设置上，创建固有的复杂的分布式，更好的让元数据更新，和进程一样，元数据的更新需要谨慎小心应用，以保证完全的正确

Suchsimplistic,bruteforceone-large-fileReadorWritebenchmarksbegintopalebycomparisontotherealworldorbycomparisontomoresubtleandrefinedmetrics—aswellastheespeciallychallengingnatureofactualrealworldworkloadsexperiencedcommonlybyrobustsolutionslikeGluster,asdeployeddailyallovertheworld.

很简单，强力的读写一个大文件的标准是开始比较真实情况或者做更细致的比较，Gluster的解决方案能帮你特别如自然的去体验实际的工作量，一如遍及世界的日常部署。

OnebigreasonGlusterperformssoextraordinarilywellinactualcustomerdeploymentsandacrossawidevarietyofdemandingbenchmarkswhererealworldworkloadsaremorerealisticallysimulatedissimple:

Glusterdoesnothaveabottleneckregardingmetadata.Infact,itdoesnotneedtoscaleitshandlingofmetadataatall!

Let‘sexaminehowandwhythatisthecase.

Gluster能在客户部署环境下工作如此良好稳定最大的原因是能交叉一个高要求的带宽要求，这样确保能最大程度的模仿真实环境中的性能，Gluster不会有元数据方面的性能瓶颈，事实上，不要专门来处理元数据，下面让我们来了解这是为什么。

WhatarethePotentialArchitectureBottlenecksAssociatedwithMetadata?

与元数据有关的潜在的瓶颈是什么？

Itisthefundamentalnatureofmetadatathatitmustbesynchronouslymaintainedinlockstepwiththedata.Anytimethedataistouchedinanyway,themetadatamustbeupdatedtoreflectthis.Manypeoplearesurprisedtolearnthatforeveryreadoperationtouchingafile,thisrequirementtomaintainaconsistentandcorrectmetadatarepresentationof―accesstime‖meansthatthetimestampforthefilemustbeupdated,incurringawriteoperation.Yes,thereareoptimizationtechniquesinvo