Dynamic Bigdata and Security with Kerberos Author-Sandesh Manohar Publish by- IJARCET

Please download to get full document.

View again

of 6
4 views
PDF
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Document Description
Abstract— “Dynamic Big data” – is an emerging concept which denotes the large amount of data and its dynamic processing for analysis. It helps many organization to easily predict the behavior of market which is helpful for their success in business.
Document Share
Document Tags
Document Transcript
     ISSN: 2278 – 1323  International Journal of Advanced Research in Comuter !n"ineerin" # $echnolo"% &IJARC!$'(olume )* Issue 3* +arch 2,1-    Abstract   — “Dynamic Big data” – is an emerging concept which denotes the large amount of data and its dynamic processing for analysis. It helps many organization to easily predict the behavior of maret which is helpful for their success in business. !e are entering into the age of “Dynamic Bigdata”. Dynamic  big data is the fastest processing of data" accumulated from various sources which gives immediate result after the analysis.#ecurity in big data is developing at a rapid pace which includes information security" data privacy " protecting data" applications and the related infrastructure with the help of policies" technologies" controls" and big data tools. !ith the right solutions" organizations can dive into all data and gain valuable insights that were previously unimaginable.  Inde. $erms  —Dynamic Bigdata, Bigdata analysis,Distributed data, Bigdata Security I.I$%&'D()%I'$Big data refers to large collections of data sets with sizes beyond the ability of commonly used software tools such as database management tools or traditional data processing applications that capture" curate" manage" and analyze within a tolerable elapsed time. #izes of big data are constantly increasing" ranging from a few dozen terabytes to today many petabytes of data in a single data set. %his increasing data need technology to be used for processing. %o meet the demands of handling such large *uantities of data" new  platforms of +big data+ tools are being developed. +Big data are high volume" high velocity" and,or high variety information assets that re*uire new forms of processingto enable enhanced decision maing" insight discovery and process optimization.+-ntry into new technology for Dynamic Big Data" thetrend toward larger data sets is due to the etrainformation derivable from analysis of a single large setof related data" as compared to separate smaller setswith the same total amount of data" acnowledgecorrelations to be found to" among many things" +spot business trends" determine *uality of research" preventdiseases" lin legal citations" combat crime" anddetermine real/time roadway traffic conditions.”#cientists face regularly obstruction due to large datasets in many areas" including meteorology" genomics"connectomics" comple physics simulations" and biological and environmental research. %his obstruction   0 also affect Internet search" finance and businessinformatics. Data sets grow in size in part because theyare increasingly being gathered by ubi*uitousinformation/sensing mobile devices" aerial sensorytechnologies 1remote sensing2" software logs" cameras"microphones" radio/fre*uency identification 1&3ID2readers" and wireless sensor networs. organization.Big Data is difficult to wor with using most relationaldatabase management systems and destop statisticsand visualization pacages" alternatively we use+massively parallel software running on tens" hundreds"or even thousands of servers.+ !hat is considered +bigdata+ varies depending on the capabilities of theorganization managing the set" and on the capabilitiesof the applications that are traditionally used to processand analyze the data set in its domain. 3or someorganizations" facing hundreds of gigabytes of data for the first time may trigger a need to reconsider datamanagement options. 3or others" it may tae tens or hundreds of terabytes before data size becomes asignificant consideration.II. D4$56I)   BI7D5%5 Dynamic Bigdata is the dynamically processing of information data as it is accumulated from the variousinformation sources in the small bloc of data storage. Dynamic data or   transactional data denotes informationthat is asynchronously changed as further updates tothe information become available. Dynamic data is alsodifferent from streaming data. in that there is noconstant flow of information. &ather" updates maycome at any time" with periods of inactivity in between. +dynamic data+ would be reused or changed fre*uentlyand therefore needs to be ept in office proper 1+near+storage2. Data is increasing day by day for immediate result of the changing maret behavior regular analysis of data isnecessary. !ith large data to be processed in the shortspan of time new technology has to be evolved whichgive results in a visualization manner. 8adoop uses thelarge data set to be stored in the huge clusters of database. !ith some improvement it can process thecontinuously changing large volume of data in terms of dynamic big data.III. 85D''9   %-)8$':'74 ; www.i<arcet.org Dynamic Bigdata and #ecurity with =erberos #andesh 6anohar  ;"  #unil #alunhe > 1  Master of Computer Applications, IMCOST, Thane 2  Master of Computer Applications, IMCOST, Thane   ISSN: 2278 – 1323  International Journal of Advanced Research in Comuter !n"ineerin" # $echnolo"% &IJARC!$'(olume )* Issue 3* +arch 2,1-  A.Hadoop: 8adoop gives a distributed filesystem and a framewor for the analysis and transformation of very large information using the 6ap&educe functionality. !hile the interface to 8D3# is patterned after the (ni filesystem" faithfulness to standards was sacrificed in favor of improved performance for the applications at hand.8adoop has an important characteristic of partitioning the data and computation across many 1thousands2 of hosts" and the eecution of application computations in  parallel close to their data. 5 8adoop cluster measures computation capacity" storage capacity and I,'  bandwidth by simply adding commodity servers. %hese specific features ensure that the 8adoop clusters highly functional and highly available?  Rac/ a0areness   allows assign a node@s physical location" when allocating storage and scheduling tass  +inimal data motion . 6ap&educe moves compute  processes to the data on 8D3# and not the other way around. 'n the physical node where the data resides"  processing tass can occur. %his $etwor I,' patterns significantly reduces and eeps most of the I,' on the local dis or within the same rac and provides very high aggregate read,write bandwidth. tilities  determine the health of the files system and canrebalance the data on different nodes  Rollac/   permit system operators to bring bac the  previous version of 8D3# after an upgrade" in case of human or system errors  Stand% NameNode  provides redundancy and supportshigh availability  i"hl% oerale . 8adoop handles different types of cluster that might otherwise re*uire operator intervention. %his design allows a single operator to maintain a cluster of ;AAAs of nodes.8adoop@s cost advantages over legacy systems redefine the economics of data. :egacy systems" while fine for certain worloads" simply were not engineered with the needs of Big Data in mind and are far too epensive to  be used for general purpose with todays largest data sets.5dvantages of 8adoop is one of the cost because it relies in an internally redundant data structure and is deployed on industry standard servers rather than epensive specialized data storage systems" we can afford to store data not previously viable. 5nd we all now that once data is on tape" it@s essentially the same as if it had been deleted / accessible only in etreme circumstances.Data is growing so rapidly and the unstructured data accounting for CA of the data today is rising" for enterprises the time has come to re/evaluate their approach to data storage" management and analytics. 3or specific high/value" low/volume worloads" legacy systems will remain necessary and complement the use of 8adoop /optimizing the data management structure in your organization by putting the right Big Data worloads in the right systems. %he cost/effectiveness" scalability" and streamlined architectures of 8adoop will mae the technology more and more attractive.  . !"H uffer#!oint Hadoop uffer$ Eoint 8adoop buffer is the clusters of buffer wherethe initial data is stored in a small space and is being processed for analysis. It can be used with hadooparchitecture.Initially information accumulated from the various sources are stored in the buffer of hadoop architecture. 5s there is a cluster of buffer near the datanode" each  buffer has data processing system where the bloc of  buffer data is being processed for analysis. %hus calculating the average of result came from data  processing the final outcome is analysed. E/8 buffer in 8adoop 5rchitecture 5ll &ights &eserved F >A;G IE5&)-% >     ISSN: 2278 – 1323  International Journal of Advanced Research in Comuter !n"ineerin" # $echnolo"% &IJARC!$'(olume )* Issue 3* +arch 2,1-  $amenode? $amenode is the node which stores the filessystem metadata i.e. which file maps to what bloc locations and which blocs are stored on which datanode. %he name node maintains two in memory tables" one which maps the blocs to datanodes 1one  bloc maps to H datanodes for a replication value of H2 and a datanode to bloc number mapping. !henever a datanode reports a dis corruption of a particular bloc"the first table gets updated and whenever datanode is detected to be dead 1because of a node,networ failure2  both the tables get updated.>2 Datanode%he data node is where the actual data resides. #ome interesting traits of the same are as follows?5ll datanodes send a heartbeat message to the namenode every H seconds to say that they are alive. If the namenode does not receive a heartbeat from a  particular datanode for ;A minutes" then it considers that data node to be dead,out of service and initiates replication of blocs which were hosted on that data node to be hosted on some other datanode. %he data nodes can tal to each other to rebalance data"move and copy data around and eep the replication high.!hen the data node stores a bloc of information" it maintains a checsum for its well. %he data nodes update the namenode with the bloc information  periodically and before updating verify the checsums. If the checsum is incorrect for a particular bloc i.e. there is a dis level corruption for that bloc" it sips that bloc while reporting the bloc information to the namenode. In this way" namenode is a wave of the dis level corruption on that datanode and taes steps accordingly.I. D4$56I)   BI7D5%5   I$   -8I)(:5&    %&533I))'$%&'::I$7   5$D   6'$I%'&I$7   #4#%-6 %he scale of traffic monitoring operations has alsogrown" as the city continues to develop". It needsreliable storage for data collected through monitoringe*uipment. 5lso" with the development of newtechnologies and the upgrade in the electronic policechecpoint system to high/definition video images"image size is larger than before" demanding better storage performance. %he average monthly data has now reached ;Aterabytes" as traffic data continues to grow. #ince datasuch as pictures and video are stored in different datacenters in different divisions" it has become difficult touse. It need to be integrated such as some trafficmanagement facilities" e*uipment" and applicationsystems run in silos.#ynchronized traffic lights aren@t the only thing that@sin store for cities of the future. !hat with the number of cars on our roads rapidly piling up in cities acrossthe world.tech companies are designing numeroussolutions to cut bac on congestion and pollution.)ompanies are actually now something in this field" asits also developing software that@s capable of predictingdriver@s behavior" presumably to wor in tandem withits traffic monitoring programs.Insights in motion draws on transit data" censusrecords" geo/spacial information and cell phone data inorder to trac the movements of thousands of peoplethrough those two cities" correlating this data withspeed of travel and travel times. By doing so" thesystem is able to gain an understanding of whichtransportation modes people are using" where peopleare going to and from – for eample" to wor" school"shopping etc" and ultimately better optimizetransportation schedules and routes to deliver a faster"more efficient and safer service.. #-)(&I%4   I##(- 'rganizations are conscious in security that are turning to Big Data #ecurity as the newest weapon in their cybercrime arsenals. Decryption on software/base adds significant etra load on a database server@s )9( and costs notably increase" along with compleity" whena solution is re*uired to scale. #ecurity operations wor in silos in many organizations.#ecurity vulnerabilities have to be handled twice once  by security teamsJ and then again by the I% operations team that could not initially identify the issue. %he log data" both raw and in common format" is archived usinghigh compression rates" and can be retained for a number of years in flat file format reducing the cost andcompleity of log data management. %here are commercial replacements available for eisting log management systems" or the technology can be deployed to provide a single data store for security event management and enrichment. (ser authentication and access to data from multiple locations may not be sufficiently controlled. 9latform management  professionals are also needed to implement 8adoop clusters" secure" manage and optimise them. H www.i<arcet.org   ISSN: 2278 – 1323  International Journal of Advanced Research in Comuter !n"ineerin" # $echnolo"% &IJARC!$'(olume )* Issue 3* +arch 2,1- =erberos%o enable permissions and authorization for 8adoop users" administrators need to first solve the challenge of user identity verification — a hurdle overcome through authentication. 8adoop has adopted a well/nown authentication method that was developed at 6I% 16assachusetts Institute of %echnology2 named =erberos. =erberos is nothing but the technology which build on cryptographic method to establish the ways for user and system to identify themselves" and to create theauthentication ticets that can be presented to multiple services. 8adoop@s security is relies entirely on =erberos.9erimeter #ecurity in that =erberos integrates with :D59 or 5ctive Directory to obtain user information. %he 8adoop vendors offer some tooling to manage =erberos. 5s an alternative" 8orton wors also  promotes 5pache =no as a way of ensuring perimeter authentication and event %he many information security professionals who successfully monitor enterprise security in real time realize that Big Data re*uirements are nothing new to security information management 1#I-62 technology. %he point is that large amounts of data becomes Big Data only when you mustanalyze that data as a set. perimeter security solutions such as firewalls and intrusion detection,prevention technologies. Big Data Issues and challenges Big data comes with numerous security issues because it encompasses many technologies including networs" databases" operating systems" virtualization" resource scheduling" transaction management" load balancing" concurrency control and memory management. %he big data issues are most acutely felt in certain industries" such as telecoms" web mareting and advertising" retail and financial services" and certain government activities.%he challenges of security in big data environments can be categorized into networ level" user authentication level" data level" and generic issues. Network Level  – $etwor level challenges deal with networ protocol" and networ security" such as distributed nodes" distributed data" Internodes communication. Authentication level  / %he challenges that can be categorized under user authentication level deals authentication methods such as administrative rights for nodes" encryption,decryption techni*ues and authentication application and logging. Data Level  – %his challenge can be categorized under the data level which deals with data integrity" availability such as data protection and distributed data.)ombing data from data level that have different level of security is especially problematic re*uiring designation of the mied data with high level of securityrestrictions. enerics Level  – %his challenge can be categorized under the generic level which deals with traditional tool and use of different technology. Distributed Data  / %he software treated all the nodes as though they were simply one big pool of computing" storage" and networing assets" and moved processes toanother node without interruption if a node failed" usingthe technology of virtualization. Driven by the limitations of 6ap &educe based systems in dealing with “varieties” in cloud data management" variety of storage and variety of processing. Distributed Nodes  / Big data is distributed data  t his means the data is so massive it cannot be stored or  processed by a single node. %he reason we have data  problems so big that we need large/scale distributed computing architecture to solve is that the creation of the data is also large/scale and distributed. 'pen source 8adoop enables distributed data. 9rocessing for “big data” applications across a large number of servers. %he computation is done in any set of nodes. Basically" data is processed in those nodes which have the necessary resources. #ince it can happen anywhere across the clusters" it is very difficult to find the eact location of computation. Internodes !ommunication " %he conse*uence of suchissue would be higher waiting time at the networ interface *ueue and lower performance. !ith nodes spread across a large geography" our cloud/based security solution re*uires a well/architected communications mechanism for inter/node message echange. %his happens over a networ" distributed around globe consisting of wireless and wired networs. 5ll &ights &eserved F >A;G IE5&)-% K     ISSN: 2278 – 1323  International Journal of Advanced Research in Comuter !n"ineerin" # $echnolo"% &IJARC!$'(olume )* Issue 3* +arch 2,1- Logging "  In the absence of these logs" it is very difficult to find if someone has breached the cluster if any" malicious altering of data is done which needs to  be reverted. :og management is a process of collection of logs from any device" aggregation of logs into a single searchable format" analysis of logs through data enrichment" and long/term retention of log data. Secure !om#utation in distributed $rogramming %ramework   / It utilizes parallelism in computation andstorage to process massive amount of data. &ardware 'so(tware )ncry#tion / -ncryption is transparent to users" can be applied on a file/by/file  basis" and wors in combination with eternal ey management applications. cybercriminals. %his data" which was previously unusable by organizations is nowhighly valuable" is sub<ect to privacy laws and compliance regulations" and must be protected.I. #-)(&I%4   #':(%I'$# Data security is a critical component of business management. Data protection solutions allows organizations to achieve strong data security without etensive deployment or management compleity. Data visualization for security remains etremely elementary"dominated by pie charts" graphs" and -cel spreadsheet  pivot tables )isco will use its networ infrastructure and cloud/based big data security intelligence for networ security automation.  A45rotectin" 6i" ata Sources 'rganizations can leverage data from a broad array of sources" both structured and unstructured" for their big data initiatives. Data from databases" data warehouses" system logs" spreadsheets" and many other diverse systems may be fed into a big data environment.%o establish data security for these diverse data sources" organizations can use the following solutions? •  *rans#arent )ncry#tion . %his encrypts and controls access at the file/system level. %his encryption solution is easy to deploy because it doesn@t re*uire anychanges to applications. • A##lication )ncry#tion . !ith this encryption"we can encrypt specific columns in an application  before it writes the field to a database. By encrypting a specific column" we can ensure a specific sensitive fieldwill remain unreadable" even after it is imported into" and processed within" the big data environment.  64Securin" 6i" ata rame0or/s In big data environments" data is routinely replicated and migrated among a large number of nodes. In addition" sensitive information can be stored in system logs" configuration files" dis caches" error logs" and so on. %ransparent -ncryption efficiently protects data across all these areas" delivering encryption" privileged user access control" and security intelligence. In addition" with 9rotection for %eradata Database" organization can gain the comprehensive" granular controls re*uired to secure the most sensitive assets across your %eradata environments" while enabling you to maimize the business benefits of your big data investments. C4Safe"uardin" 6i" ata Anal%tics Big data output comes in many forms" including on/demand dashboards" automated reports" and ad hoc *ueries. ery often" these outputs contain intellectual  property that is very valuable to an organization—and a potential target of attac. %o provide big data analytics security for these confidential assets" security teams canuse the following solutions? • *rans#arent )ncry#tion . %his encryption  product can easily be deployed on servers" where it can encrypt big data outputs and control and monitor who accesses them. • A##lication )ncry#tion . !e can use this encryption to secure specific fields that may be createdin analytics applications.II.) '$):(#I'$   Dynamic big data can deliver better products" but to effectively achieve this" it should be used to test the *uestions that have previously been impossible to answer until after the product has gone to maret. But what was impossible five years ago is now mundane in terms of computing power and capability. It could help many organizations to successfully run their business. %he application of E/8 buffer in the hadoop will results in processing of data in simplest form. In the future" significant challenges need to be tacled by industry and academia. It is an urgent need that computer scholars and social sciences scholars mae close cooperation" in order to guarantee the long/term successof big data and collectively eplore new territory.   &-3-&-$)-# L;M“-nhance Big data #ecurity”/ 5dvantechL>M8adoop Distributed 3ile #ystem/ &obert )hansler"8airong =uang" #an<ay &adia" =onstatin #hvochoand #uresh #rinivasLHM%owards #calable #ystems for Big Data 5nalytics?5 %echnology ? I--- paper – 8an 8u"4onggang!en" %at seng" )hua" Nuelong:iLKM8adoop and Big data by )louderaLGMImproving %raffic 6anagement with big dataanalytics? Intel )ase #tudyLOM“#ecurity Issues associated with big data in cloudcomputing”/ enata $arasimha Inuollu; " #aila<a5rsi; and #rinivasa &ao &avuri G www.i<arcet.org
Similar documents
View more...
Search Related
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks