Afful et al • Redefining the Concept of Big Data: A Ghanaian Perspective 38 Redefining the Concept of Big Data: A Ghanaian Perspective Eleanor Afful1, Kofi Sarpong Adu-Manu2, Grace Gyamfuah Yamoah2, Jamal-Deen Abdulai2, Nana Kwame Gyamfi2, Edem Adjei2, Isaac Wiafe2 and Ferdinand Apietu Katsriku2* 1KACE AITI 2Department of Computer Science, University of Ghana *Corresponding author: fkatsriku@gmail.com ABSTRACT The world is on the verge of a data tsunami. Voluminous amounts of unstructured data are being generated using different technologies. To manage the huge amounts of data being generated, a new concept of ‘Big Data’ has evolved. The emergence of ‘Big Data’ is leading to real transformation in the business world. Governments and commercial enterprises on the African continent are beginning to take an interest in the use of technologies associated with Big Data for the analysis of enormous amount of data they currently generate and they wish to do so in real time. The advances being made in big data technologies have fuelled this uptake. Until recently companies in Ghana did not realize the utility of big data analytics due mainly to lack of knowledge and the limited penetration of these technologies. Increasingly, however, these companies now realize the difference in value that data analytics could make to their decision making process and to develop strategies that will give them competitive advantage. It has become clear to many of these corporate organizations that they are in possession of large volumes of data which, if properly analysed, can provide them with a wealth of knowledge to run their businesses more efficiently and productively. The analytic necessary to the understanding of these wealth of data are provided by big data technologies. This paper seeks to redefine the concept of big data and reviews its development, the potential impact that big that can have on a developing economy, the sectors of the economy of Ghana that stand to gain most from adoption of big data technologies and how these can be achieved. We propose that big data concept be defined more objectively by the use of a function. The paper shows how big data can be leveraged for rapid economic advancement. The paper additionally examines the investment prospects of adopting big data technologies for the economic environment of Ghana and some of the issues that organizations must resolve to successfully implement the technologies in Ghana. Keywords: Big Data, Analytics, Economic development, Big Data Architectural, Ghana Introduction Traditional data modelling and organization methods management tools are not well suited to handling such have proved useful and appropriate for varied functions information. Data aggregation, transforming data in the past few decades and this is attested to by the scattered across multiple sources into a new summary, phenomenal success of relational database systems. is one of the key features used in databases, especially These traditional methods are coming under huge strain for business intelligence (e.g. extract, transfer and with the exponential growth in data and in most cases load (ETL), online analytical processing (OLAP) and the traditional systems are unable to cope. This growth analytics /data mining). in data has been fuelled in particular by the success of internet companies such as Google and Facebook. For databases built on Structured Query Language (SQL), aggregation is used to prepare and envision data Huge volumes of information are still available and for a more profound level of analysis. Such an operation unexploited because the existing data modelling and is however difficult and often impossible to perform on Science and Development Volume 2, No. 1, 2018 Afful et al • Redefining the Concept of Big Data: A Ghanaian Perspective 39 enormous volumes of data in terms of the memory and at one processing centre. In the literature volume has time requirements (memory-and-time-consumption). been defined as how much data there are, velocity, the rate at which new data are created and how quickly the Database maintenance and optimization is a key activity data are processed and variety is defined in terms of the for relational databases. As the number of queries from format of the data, whether structured, semi-structured across multiple sources increases, optimizing query or unstructured. execution becomes difficult to handle. For databases bigger than relational ones, a key requirement is that they The two new dimensions are veracity, used to refer to be maintained and optimized for continuous optimal the trustworthiness of the data and value, which refers performance; such a task thus becomes less than trivial. to what gain businesses’ can derive from the data have been added lately. Other dimensions have also been used, Additionally, the data residing in the database must notably volatility and validity; however these have not be highly structured and cleansed. Businesses spend gained widespread acceptance and use. It is noteworthy significant effort to extract, transform and load the that these definitions only provide a qualitative view data between data warehouse and relational databases. of what is described as big data. Some researchers have Enormous costs are involved in doing these and greatly sought to define ‘big data’ in terms of a fixed volume limits the breadth of data available for analysis. The such as petabytes or zettabytes. There is however no current systems are not easily scalable and do not scale consensus on exactly what quantity of volume would up to the combined increase in velocity, volume and constitute big data. Velocity may be defined as the rate of variety as defined for big data. change in the volume of data generated and transferred This paper proposes a more objective definition of the to the enterprise office. Value is derived from the concept of big data, looks briefly at its development and analytics performed on the data. Ultimately, the volume what impact these technology can have on a developing and velocity are intimately linked to the processing economy. The paper examines those sectors of the capacity of the system under consideration and hence Ghanaian economy which could possibly benefit the the business needs. Assigning a numerical value to what most from application of big data technologies and how will qualify in terms of volume as big data is thus not the aforementioned benefits might be achieved, based on very useful. What may qualify as big for one enterprise new architectures. may not be so for another enterprise. A helpful definition will be “when the data arriving begins to exceed the processing capacity of the conventional database and data Background warehouse solutions available to an enterprise”. For many Initially Big data is normally defined using the three V’s, businesses therefore, big data becomes a moving target Volume, Velocity and Variety. However of late, two other as they need to constantly evolve new solutions for the parameters have been included: Veracity and Value. data they process. The volume of data will depend on the The big data concept may then be depicted as shown in rate at which new data are being generated and the rate Figure 1. Variety is assured through the numerous and at which there are arriving. As such volume and velocity diverse data sources, each generating some quantity are intricately linked. What has not been mentioned as of data per unit time to the data volume. The amount far as velocity is concerned is the rate at which the data of data generated per unit time may not be static but are being processed. This constitutes another aspect of dynamic and subject to change over time. The data being velocity not intricately linked to volume. Even though it generated from these sources may either be structured may be argued that the processing rate affects the volume or unstructured. Volume is the summation of all the data of data yet to be processed, it does not affect the total coming from diverse sources per unit time and arriving volume of data an organization has. Science and Development Volume 2, No. 1, 2018 Initially Big data is normally defined using the three V’s, Volume, Velocity and Variety. However of late, two other parameters have been included: Veracity and Value. The big data concept may then be depicted as shown in Figure 1. Variety is assured through the numerous and diverse data sources, each generating some quantity of data per unit time to the data volume. The amount of data generated per unit time may not be static but dynamic and subject to change over time. The data being generated from these sources may either be structured or unstructured. Volume is the summation of all the data coming from diverse sources per unit time and arriving at one processing centre. In the literature volume has been defined as how much of data there is, velocity, the rate at which new data is created and how quickly the data is processed and variety is defined in terms of the format of the data, whether it is structured, semi-structured or unstructured. Afful et al • Redefining the Concept of Big Data: A Ghanaian Perspective 40 Storage Systems Enterprise office Analytics Value = = 1 1 2 … N Data generating sources/Variety Fig. 1: Big data concept and interaction between the v’s Veracity of dataF iisg aulrseo n1.o tB ai gn edwa tcao cnocenpcte.p Btu asninde isnsetesr haacvteio n bevtawlueee nth tahte m va’ys be derived from insights such data provide always sought to ensure the quality of data they obtain. on analysis. When the data grow beyond their current What is new is the fact that with big data, a substantial proc3e ssing capability, they simply upgrade their systems proportion of t he data is from external sources and hence to more powerful ones and in most instances the same additional measures now need to be taken to ensure the relational database technology. As such, growing data quality of the data. This, coupled with the wide variety volumes has always been an issue that businesses have of data, necessitates new requirements. Data cleansing to contend with. The rate at which new data are being thus becomes an additional and important element of generated and arriving and the variety of formats the the processing cost. data take are hence the key issues. Whilst in the past businesses had to deal with data of the same type and Creating business value out of data has always been form, the explosion in new data sources mean that data a driving motivation for many businesses, so this is are constantly being generated from new sources and not a new concept. What provides a business with a taking different forms. Businesses now have to integrate competitive advantage is what insights it can derive from different forms of data; unstructured, graph data, voice, its data warehouses to enable it make better, real time, images, video, etc. What is the relation between the and smart decisions. This implies a real need for detailed value we derive from big data and the other dimensions and in-depth knowledge and hence the demand for new namely volume, velocity and variety? It is proposed analytical tools. that a relation be established between the value derived The key new dimension now is therefore variety. from big data and the other dimensions, namely volume, Businesses have always been interested in data for the velocity and variety. In our opinion, it would be helpful to Science and Development Volume 2, No. 1, 2018 Afful et al • Redefining the Concept of Big Data: A Ghanaian Perspective 41 view big data and the value obtained from it as a function server farms very quickly and cheaply. These systems also dependent on the parameters of volume, velocity and have a high level of fault tolerance since the same data are variety. Such a relation may be expressed as a function stored on different machines and they do not have any dependent on the parameters of volume, velocity and limitations on structure, which means one could store variety. almost anything together. Given some of these challenges, it is necessary to develop With NoSQL it is possible to efficiently and cost- new technological means of managing, analysing, effectively build massive computing systems capable of visualizing and extracting meaningful information from handling the exponential growth in the volume of large the large and complex heterogeneous data sets that are data sets as is currently experienced. Independent of being generated from diverse and distributed sources. its format structured, unstructured or semi-structured Progress in this area will enhance scientific innovation the techniques underpinning NoSQL, ensure that the and provide new paths to scientific inquiry. The limitations imposed by RDBMS on data size, format and development of new data analytic tools and algorithms speed are eliminated, leading to fast and efficient ways will also be an important outcome of any progress made. of processing and analysing variety of data in real-time, Other benefits will include the development of scalable bringing real benefits to businesses. data infrastructure and architecture which will ultimately lead to a better understanding of social processes and With regards to the velocity with which new data are interactions for greater security, economic growth, and being generated, NoSQL has the capability to process in general, an improved quality of life and the wellbeing terabytes and exabytes amounts of data in real time. The of people. new techniques implemented in NoSQL process, extract, load and transform data in the database eliminating the To efficiently model the data requirements for business need for the data to be transferred in and out of the data- intelligence and analytics, a new technology has emerged, base of the data warehouse. The advances in processing called NoSQL (Non-Structured Query Language), and storage capabilities in computing technology in the a distributed non-relational database with variations last decade with increased speed has effectively eliminat- in implementation. NoSQL was designed to cater for ed data size as a constraint. demands of data that were being generated by the web. The origins of NoSQL can be traced to the work done at Google in building a proprietary database, Big Table. Platforms and tools for big data The Big Table was designed to overcome some of the Even though arguments have been made to the effect that inherent limitations of Relational Database Management most of the data generated today are either unstructured Systems (RDBMS). Some of these limitations are the or semi-structured, emerging big data technologies need for specialized and robust servers that were less could be divided into two categories- structured and prone to failure; the length of time required to process unstructured data. To handle structured big data, a queries; and the need to have structured data. Since number of customized technologies have emerged. this early work, many companies have also turned their These technologies are aimed at storing and retrieving attention to developing such systems that promise the large amounts of data associated with big data. The low implementation cost in terms of the hardware Google File System (GFS) is an internet scale file system, requirements and the ability to massively scale up a robust and scalable system which provides the sort of horizontally i.e. by adding thousands of nodes so that reliability required for internet applications. Object- storage and retrieval would be distributed across them store techniques aim to improve on redundancy and using parallel processing techniques, thereby reducing data availability. The Amazon Simple Storage System, storage and retrieval time. All these allow the setting up of OpenStack Swift and Nirvanix cloud storage are examples Science and Development Volume 2, No. 1, 2018 Afful et al • Redefining the Concept of Big Data: A Ghanaian Perspective 42 of this approach. Underlying many of these solutions Implementation Areas of Big Data is the massive parallel processing (MPP) technique. Key/Value Store is a fundamental data model used MPP is based on a distributive processing architecture for example in Hadoop, Voldemort, DynamoDB and consisting of a series of nodes controlled by a master. Memcached. Key-value databases are lightweight, When engaged the master distributes a query across schema-less, relationship-less and transaction-less data the nodes for maximum processing efficiency. Similarly, stores used primarily for storing temporary data in the system can do autofast data import and export memory. Examples of such formats of key value database through the same underlying mechanism. Almost all used for very large scale storage systems include Riak, the vendors operating in this domain use either software Redis and MemcachedDB. The key can be synthetic or or hardware combined into a single compliance. This auto-generated while the value can be String, JavaScript ensures consistency in the hardware and that is crucial to Object Notation ( JSON), BLOB (basic large object) etc. obtaining optimum performance. Data locality plays an The key value type basically, uses a hash table in which important role in obtaining high performance in big data there exists a unique key and a pointer to a particular analytics. By processing the data as close as is possible to item of data. There can be matching keys in different its generating source, we minimize the highly prohibitive containers which are made up of logical group of keys. costs of data transfers. MapReduce exploits the concept Performance is enhanced to a great degree because of of data locality to give an improved performance. A the cache mechanisms that accompany the mappings. variant of MapReduce is Hadoop. Hadoop is an open Key/Value pairs however fail to offer ACID (Atomicity, source implementation of MapReduce. It is based on Consistency, Isolation, Durability) capability, as they fail the Hadoop Distributed File System (storage) with on consistency. This capability must be provided for by distributed processing architecture consisting of a series the application itself. To read a value one needs to know of nodes controlled by a master. When engaged the both the key and the bucket because the real key is a hash master distributes a query across the nodes for maximum (Bucket + Key). The Key Value Store database model processing efficiency. This programming paradigm allows is popular because it is easily implemented. A major for massive scalability across hundreds or thousands of weakness of this scheme is that it becomes increasingly servers in any Hadoop cluster of nodes much like Google difficult to maintain unique value keys as the volume of core infrastructure, which requires different skills sets. data grows. To address this challenge, complex schemes Building analytic solutions requires knowledge of a new are introduced to generate unique character keys for very set of Application Programming Interfaces (API) and large sets. this is one major drawback. MapReduce is typically controlled by Java programming language, the term is used to refer to two separate and distinct jobs that Document Oriented Database Hadoop programs perform. In the first task the program The idea here is to aggregate the data, mainly in the does a mapping of input data and then processes it to form of key value pairs, this is then compressed into a produce key/value pairs. The reduce function takes those searchable record format. XML (Extensible Markup key/value pairs and then combines or aggregates them to Language), JSON ( Java Script Object Notation) and produce the final results. The name, MapReduce gives a BSON (which is a binary encoding of JSON objects) clue of the order in which the tasks are carried out, the are some of the typical encoding schemes available. One reduce job is always performed after the mapping has significant distinction between a key-value store and a been done. Combining the use of data warehousing, data document store is that a document store has associated mining and relational database alongside techniques such with it the attribute metadata related to the stored as optimization, simulation, visualization and predictive content. This provides a means of querying the data analysis for big data sets provide better strategies to obtain based on the stored content. Unlike traditional relational insight from massive data sets enabling better decision. Science and Development Volume 2, No. 1, 2018 Afful et al • Redefining the Concept of Big Data: A Ghanaian Perspective 43 databases where data and relationships are stored in efficient data aggregation. In relational databases a single tables, in this scheme they are simply a collection of row is stored as a continuous disk entry. As a result, documents independent of each other. Document style different rows of data may be stored as different entries databases are schema-less and this makes a simple task of on the disk. On the other hand, columnar databases store adding fields to JSON documents without having to first all the cells which correspond to a particular column as a define the required changes. The most commonly used contiguous disk entry; this makes the search/access time document-based databases are CouchDB, Apache and much faster than can possibly be achieved in a relational MongoDB. To store data CouchDB employs JSON with database. JavaScript as the querying language and MapReduce and Hypertext Transfer Protocol (HTTP) to implement the application programming interface. Graph Database The graph database is the final variant of NoSQL database management systems that is considered in this Column Family Database work. Unlike the other models, the graph based DBMS A column-family database provides the capability to models represent the data based on tree-like structures organize the rows as groups of columns. This capability and using edges to connect the various nodes such as is implies that each single row of a column-family database used in graphs. Just as in mathematics, certain operations now has the capacity to contain several columns. All are much simpler to perform using these types of models. the columns which are related are grouped together as These databases are commonly used by applications column families providing the capability to retrieve the where it is necessary or required to establish boundaries columnar data for multiple entities. This is achieved for connections. For example when you register on a through an iterative process. The flexibility that column social network of any sort, your friends’ connection to family provides applications enables a wide range of you and their friends’ relation to you are much easier complex queries and data analyses to be performed. to work with using graph-based database management This is reminiscent of the functionalities supported by systems. An example is Neo4J, the most widely used a relational database. This design enables them to store graph store apart from RDF (Resource Description massive volumes of data running into billions of rows with Framework) triple stores. each row containing hundreds and possibly thousands of columns. Significantly, a column family database can still provide very fast access to these vast quantities of data Architectural (Conceptual) framework due mainly to a most efficient storage mechanism. If a In order for Ghana to leverage big data for economic column-family database is well-designed then it will be development, a conceptual framework supporting the fundamentally faster and have greater scalability than an activities of all stakeholders (individuals, private and equivalent relational database holding the same volume public sector) should be developed. This architectural of data. This performance is achieved at a cost, it can only or conceptual framework should take into consideration support a specific set of queries unlike the queries in a the role of companies or organizations, policy makers, relational database which are more generalized. Designers institutions, and individual users towards the adoption of column-family databases must ensure that column of big data for economic development. families are designed optimizing for the most commonly use queries for the applications under consideration. In Several frameworks have been discussed in the literature contrast, majority of relational Database Management (Manyika et al., 2011; Wamba et al., 2015; Global Systems (DBMS) store their data in rows. Storing data Pulse, 2012). We propose in this paper an architectural in columns as done in column families has the benefit framework that can support the use and implementation of allowing fast search/access as well as providing for of big data to boost the Ghanaian economy as depicted Science and Development Volume 2, No. 1, 2018 Afful et al • Redefining the Concept of Big Data: A Ghanaian Perspective 44 companies or organizations, policy makers, institutions, and individual users towards the adoption of big data for economic development. in Figure 2. This framework seeks to point out the to also play their part by promoting and fostering data- Sebveenreafil tfsr oamf tehwe ourske so fh bavige d baetae nin ddirsicvuinsgse tdh ei ne ctohneo lmitye roaft ure (dMrivaenny iiknan oevt aatlio.,n 2 0an1d1 ; gWroawmthb at herto aulg.h, out economies 20G1h5a; nGal. oFbraolm P utlhse , f2ra0m12ew). oWrke, pa rocoplolasbeodr ianti othni sb peatwpere na n arc(hAitnedcrtaudrael e ft raalm., 2e0w1o4r)k. Fthoar tb ciga nd astuap top orerta lize its potential thteh ue speu abnlidc aimndp plermivaetnet aseticotonr os fin b Gigh danaata i st oa sbtoeop stto twhaer dGs hanaiina nG ehcaonnao, minyn oavs adtieopni cwtehdi cihn iFsi gdurirvee n2 . by advances in an integrated economy and this can boost productivity technology, policymakers need to articulate coherent Thsiigs nfirfiacmanetwlyo rkw istehe ktsh eto ipmopinletm oeunt ttahtieo nb enoef fitbsi go f dtahtea .u se ogfu ibdiegl idnaetsa, sitnan ddrairvdins gan tdh ep oelcicoineos mony tohfe use of data and GhCaonma.p Fanroiems/ tinhde ufsrtarmiees wino Grkh,a an ac oalrlea ebxopreactitoend tboe tpwroeveind et he puthbeli cas asoncdi aptreidv atetec hsneoctlogrsie isn. AG phoasnsaib ilse aw ay of achieving steinpc etnotwivaersd tso eannh ainntceeg trhaete edc oencoomnoy manyd aanlsdo ftohri su secrasn i nb oostt hpisr oodbujcetcitvivitey iss itghnriofuicgahn tolyp ewnnitehs s thaned transparency; the form of rewarding innovation. Big data analytics offer ensuring that using open data formats public data is imap hluemgee enctoantioomn ioc fi mbipga cdta ftoar. oCrgoamnipzaantiioenss/i (nGdaunsgtraidesh airna nG, hanaa accree sesxibplee,c pterdo mtoo ptirnogv liedgei silnactieonnt iwvehsic tho is balanced and en2h0a1n4c)e. the economy and also for users in the form of rewwarhdicinhg t aiknenso ivnattoi ocno.n Bsiidge rdaatitoan a nthaely ctoicms peting needs of ofPfeorl iac yh mugaek eercso pnroodmuicce i amnpda ucste for organizati all sectors of the economy; and supporting education data to facilitatoen esn (hGanacnegda dharan, 2014). that focuses on equipping students with data science policymaking processes, it is encumbered upon them skills and competencies (Andrade et al., 2014). Users (encourage innovation) Provides Companies/industries Incentives/ workflows Enhance economy Provides Technology and Infrastructural Public/private sectors support (data analytic collaboration tools/platforms) Provides Supports Policy Makers Creates enabling digitization (Gov’t) environment Companies Creates value for data Towards Digitized Data Institutions Capacity building (analytics/managerial expertise) Consumer devices Access/use Individual Users (PCs, laptops, digital data smartphones) Fig. 2: Architectural framework for big data implementation in Ghana 10 Figure 2. Architectural framework for big data implementation in Ghana Science and Development Volume 2, No. 1, 2018 Afful et al • Redefining the Concept of Big Data: A Ghanaian Perspective 45 Institutions also have a role to play in the progressive This is however nowhere near what is possible. In a big development and implementation of big data in data environment, the data are expected to be generated Ghana. One of the major challenges is lack of expertise. or created digitally such that they could be processed or Individuals with technical know-how in big data analytics manipulated by an electronic device. The recent Police are in short supply. According to Manyika et al., (2011), and Fire Service e-Recruitment drive is an example a major limit on realizing value from big data is the acute of how such migrations could be achieved (Ghana shortage of skills and talent, especially of people with Police Service 2016 e-recruitment, Ghana Fire Service profound capability and proficiency in statistics and e-recruitment 2016). The data produced should have the machine learning, and managers and analyst who know ability to interact with other digital services. They should how to operate companies by using insights from big be collected automatically after they have been produced. data. This places a huge responsibility on educational The location or time span for one operation should be institutions and Information Technology training centres available/accessible and should be analysed in real-time to build the right capacity for the nation to have a cream with no difficulty. Until the country meets some or all the of talents ready to take up the task of analytics. This sources of big data for development, Ghana will not be requires that the relevant departments be strengthened able to leverage big data for economic development. to enable them to fulfill those expectations. The role that Computer Science can play in transforming economies in the 21st century and beyond is well known and the Big Data and Ghana’s Economy arguments in support of this have been well made. This The world has reached a stage where data are all around must however be driven from the highest echelons of us. This data can be obtained from digital images, social government. A conscious effort must be made on the part media streams, financial and banking transaction records, of government to move towards an economy driven by wired and wireless sensors, GPS signals, and a myriad of advances in computing technology and hence the need other sources. Today, approximately 12 terabytes of data to support computer science education in the country. are generated from tweets alone on a daily basis. The flow is quickening and shows no signs of abating; with nearly Global Pulse (2012) discussed the foremost 90% of the data in the world today created in only the last apprehensions and challenges raised by big data for two years. We are truly facing a data tsunami; and there development and suggested probable ways of addressing will be 44 times more of the data currently available by some of them. They went ahead to discuss the sources the year 2020 (Manyika et al., 2011 as qtd by Gobble of data for development in a growing environment 2013). The advent of disruptive technologies such as such as Ghana. For a developing country like Ghana Internet of Things plays a huge contributory role in the to benefit from the full potential of big data, these data phenomenal growth in data that we now witness. sources should be taken into consideration. Most sectors of the economy in Ghana still depend to a great extent Recently, the Ghanaian economy has seen great boost on paper-based record-keeping, as such, the data source in the emergence of companies and organizations is not automated and hence, easily digitized. The data that collect increasing amounts of digitized data from sources can be digitally generated, passively produced, clients and employees. Some of these sectors of the automatically collected, geographically accessible and economy are the oil and gas industry, healthcare continuously analysed (Global Pulse, 2012). These industry, financial services (banks), telecommunication data sources are relevant for big data for economic industries, government agencies, retail shops and other development. Ghana generates massive amounts of data driven businesses. In this paper, only a few of these digital data from different streams of the economy (online areas will be discussed along with the potential impact data) from different organizations and online platforms. of big data analytics. The increase in telecommunication network providers in the country is an indication that the Science and Development Volume 2, No. 1, 2018 Afful et al • Redefining the Concept of Big Data: A Ghanaian Perspective 46 majority of Ghanaians have subscribed to one or more most of these data that are being generated and collected of these telecom networks. Ghana can take advantage of are from different devices and are of different formats the opportunities big data offers that can be leveraged (photos, videos, text, audio, etc.). This makes the data to create a better environment for its citizens and unstructured. organizations. In recent times, Ghana has seen a major shift from paper- In 2011, the Government of Ghana introduced some of based to electronic record keeping in most of the agencies its services online. The online services are made available and ministries. For example, recently the National Health at the Government of Ghana web portal. The web portal Insurance Authority (NHIA) introduced biometric data promises to serve as one-stop window for services and collection for all clients on their scheme. Other agencies information offered by all Ministries, Departments such as the National Identification Authority (NIA), and Agencies (MDA), MMDAs and other relevant the Electoral Commission (EC), the Ghana Education government of Ghana agencies. The portal consists of Service (GES), the Social Security and National four sub-portals, categorized as Citizens, Non-Citizens, Insurance Trust (SSNIT) and the Ghana Health Service Businesses and Governments as shown in Figure 3. (GHS) are all transitioning from the traditional data This is a clear indication that substantial structured and collection and progressing to electronic data processing unstructured data will be obtained by the government and collection. Figure 4 depicts how these agencies (Ghana Government e-Services Portal, 2011). Since access their individual databases for their day to day Ghana performs these services online it is possible that transactions. Fig. 3: e-Services Portal of the Government of Ghana Science and Development Volume 2, No. 1, 2018 (NHIA) introduced biometric data collection for all clients on their scheme. Other agencies such as the National Identification Authority (NIA), the Electoral Commission (EC), the Ghana Education Service (GES), the Social Security and National Insurance Trust (SSNIT) and the Ghana Health Service (GHS) are all transitioning from the traditional data collection and processing to electronic data processing and collection. Figure 4 depicts how these agencies access their Afful et al • Redefining the Concept of Big Data: A Ghanaian Perspective 47 individual databases for their day to day transactions. Segmented databases NHIA NIA EC GES SSNIT GHS Agencies/Sectors Fig. 4: Shift from paper based records to electronic records One major challenge here isF tihgaut r t eh e4s.e Sagheinftc iferso dmo pnaopt er b(aSsuerde srhe, c2o0r1d2s) .t oIn e tlheec htreoanltihc inredcuostrrdys, the introduction share their data. Thus though they produce and collect of biometric registration of patients and employees has a lot of daOta,n teh emy aajroe rs ictthinagll ienn dgaet ah weraere ihso uthseast wthitehsoeu at gencaitteess tdeod tnoo tth seh daartea tahceqiuri sditaiotan. wTihthuisn tthhoatu sgehc ttohr.e Ey apchro duce being put aton dm cuochll eucset. aU lsoerts oafr ed uantaab ilte itso sgiatitnin rgem ino ted ata wofa trheehsoe uasgeesn cwieist hgoenuetr abteei nhgug pe uatm toou mntu ocfh d uatsae .a bUosuet rs are access to this data, and when they do get access, there their clients especially when there are transactions to be are no anaulnytaibcalle ttooo lgs afionr raenmy omteea ancincgefsusl itnof othrmisa tdioanta andp rwocheesnse dth. eAy s hdioft igne tth aec Gcheasns atihane reec oanreo mnyo faronmal pyatipcear-l tools to be extrfaocrt eadn yfr omme aint. inItg fisu li minpfeorramtivaet itohna tt op obleic eiexs tractbeads efdr otmo d iitg. iItat li sd aitma pise irna ttihvee rtihghatt pdiorelicctiioens . bThe pisu sth iinft place be put in place to regulate sharing and accessing data cannot be accomplished solely on the basis of imported from a cotom rmeognu laptlaet fsohrmar inwgit hainnd tahce cpesusbilnicg sdeacttoar ;f romt eac hcnoomlomgioesn bpulat tfmorumst wbeit hdirni vtehne pbyu blloicca sl eccotontre natn d the the governgmovenetr ncmoueldn tc rceoautel dh icgrhe avatelu eh ifgohr svuaclhu ed aftao.r sucahch dieavtead. tRharothugehr trhesaena rlcoho lkea fdoerrs hsoipl ubtyio onusr oacuatdsiedmei co f the Rather than look for solutions outside of the country, institutions. departmenctos uonf ctorym, pudteepr ascritemnceen sthso uoldf bceo stmrepnugttehre nesdc ience should be strengthened to provide the research to providlee athde rsrehsiepa rnchec elesasdaerrysh fiop r nitesc ersesalriyz aftoiro nit.s realization. HealthcareThe financial services arena has also seen aF osrh aanrpy hineaclrteha isnef oirnm iattsio onn slyinstee mtr aton sbaec teiffoencst.iv Te hite i su se of The finanmciaolb isleer vcicoems maruennai cahtaiso na lsdoe vsieceens ah assh abrpe comiem prourttainte t hfaotr itp hears oacncaels sc toom amll huenailctha tdioatnas p earntdin eanlts o for increase in its online transactions. The use of mobile to the case communicfaintiaon cdieavli caens dh abs ubescinoemses rotruatninsea fcotri opnerss oinacll uding money utrnadnesrf ecor,n sjiodbe rasteiaornc hin, rbeualy itnimge .a Innd msaenllyi ng of countries of the world this data would come from many communicgaotioodnss aasn dw aellslo a sfo fro fir ntahnec itarla nasnfde rb ousfi ndeastsa sucdhiff aerse nstc haonodl ugnrcaodnense, cetexda msyisnteamtios,n H raemsumltosn, ds toetc kal .l evels transactions including money transfer, job search, (2010). Lewis et al. (2012) reported that in low and buying and selling of goods, as well as for the transfer midd1l4e income countries 42% of health institutions use of data such as school grades, examination results, ICT to extend geographic access to health care, whilst stock levels and prices of various commodities, medical 38% use it to improve on data management. According information (Global Pulse, 2012). The introduction of to Raghupathi (2010) as citied by Raghupathi and new technologies is helping to drive a wave of innovation Raghupathi (2014), the healthcare industry historically across the African financial services sector as banks has generated large amounts of data, driven by record create new and accessible banking channels and take keeping. Ghana is not very different. The health industry banking services to previously unbanked parts of society Science and Development Volume 2, No. 1, 2018 Afful et al • Redefining the Concept of Big Data: A Ghanaian Perspective 48 in Ghana generates millions of data records, but most of Another key consideration associated with the these are stored in hard copy form, whereas the current significant growth in data volume of financial institutions trend is toward rapid digitization of these large amounts is risk in the form of fraud. Constant vigilance and of data. A number of health facilities are now moving deterrence through technology is the key to protection towards digital records, but currently all these efforts are and employing big data technology is a key measure to segmented and disjointed. To derive benefit from the prevent attacks (Kothai, 2015). In implementing Big digitization process, these efforts by individual facilities Data in the financial industry, the proposed framework in have to be coordinated and centralized (Asangansi and Figure 5 is proposed. Braa, 2010). Effort must be made to implement an architectural platform onto which individual agencies can simply ‘plug in and play’. Health policy makers in Challenges in implementing Big Data in Ghana Ghana must provide vision and develop the required In practice, Big Data as a technology faces many challeng- strategies necessary to achieving a fully integrated health es, one of which is heterogeneity and incompleteness. information system for the country. Since computer systems work most efficiently if they can store multiple items that are all identical in size and structure, the efficient representation, access and analysis Financial Services (Banking) of unstructured or semi-structured data poses analytical There is convincing evidence that business has now and storage difficulties. Another challenge is with the recognized the ascendancy of data in the business volume of data to be worked on within an organization. sphere. In a survey recently conducted by Capgemini Managing large and rapidly increasing volumes of data and the Economist of over 600 global business leaders, can be challenging and requires that faster processing three-quarters of business leaders agreed that their components and storage systems be designed and built. organizations were data driven, and 90% of them, besides Also, with large data sets to be processed, speed could be land, labour and capital, recognized information as the an issue to deal with. fourth factor of production (Gobble, 2013). This is because the larger the data set to be processed, the Ghana’s banking sector has transformed from traditional longer it will take to analyse. Another challenge is privacy. walk-in and operate transactions to online and electronic For instance, there are strict laws governing what can and banking operated venture where the presence of the cannot be done with electronic health records. Big data customer is not really needed. The sector has expanded raises concerns and fears regarding the inappropriate substantially over the last decade. The financial sector use of personal data, particularly through linking of data generates and stores massive amounts of data about from multiple sources. customers. According to Suresh (2012), data from the banking industry indicate that banks in the Ghanaian The implementation of Big Data in Ghana comes with its markets spend up to10 percent of their operating income own challenges apart from the ones discussed above. Big on data management. Data has much potential for development. Big Data for development has been defined by Global Pulse (2013), In Ghana, despite the challenges in managing and to mean the identification of sources of big data relevant securing customer data, Fidelity Bank, a mid-sized to policy and planning of development programmes. This financial firm that has grown over the last ten years to concept is distinct from both “traditional” development become one of Ghana’s leading financial institutions, has data concept and what the private sector and mainstream invested in a comprehensive, Big Data solution (Suresh, media refer to as Big Data. 2012). Science and Development Volume 2, No. 1, 2018 Afful et al • Redefining the Concept of Big Data: A Ghanaian Perspective 49 Difference Data Patterns/ Representation Content with Malicious Code Mobile Banking Internet Banking Anomally ATM Detection (Fraud Detection) Teller Transactions E-Commerce E-Zwich Services Other Service (Insurance, Utility Bills, School Fees) Security Infrastructure Internal/External Com. Customer Live Events Mobile Data Financial News Content without Compliance Log Files Malicious Code Difference Data Patterns/ Representation Fig. 5: Framework for big data implementation in the health sector The lack of infrasFtriugcuturera 5l . sFurpapmoretw aonrdk ftohre birgig hdta ta imRepsleeamrcehn letaadtieornsh iinp hthase a h keeayl rtohl es teoc ptolary in realization of tec hnology is also a challenge to the implementation of the benefits big data has to offer for the Ghanaian econo- big data in Ghana. Another challenge is the unavailability my. Research funding should be provided to education- of the skilled personnel with the knowledge and skills al institutions to run courses and training programmes relTatheids tios bbeigc aduastea tahnea llyatricgse. rTh thee Cdoatmam seotn wtoe ablteh porfo cessaeimd,e dth aet plroondguecrin igt twhei lcla tdarke eo ft poe rasnoanlnyesle w. it hA cnooreth sekirl ls Aucshtraallliea ning e2 0i1s3 psurigvgaecstye.d Fthoart sininsctea nthceer,e tihs ea rseh oarrteag set rict tloa dwrisv eg doevveerlonpinmge nwt hina tth ce afine lda.n Cdo lclaabnonroatti obne b deotwneee n of university degrees that have a curriculum focused on government agencies and research/academic institutions bigw diathta eanleaclytrtiocns,i ict ihs eimalptho rrtaencto frodrs e.d Buciagt idoant par oraviidseerss concweirlln bsr ianngd m foeraer osp rpeograturdniitniegs tfhore s iknilalsp dpervoeplorpiamtee nuts aen d to odfe spigenrs coonuarsl eds agteaa,r epda rtotiwcaurldasr leyd uthcartoioung han ldin tkraiinngin og f dattara finroinmg amndu altlsipo lteh es oGuorvceersn.m ent agencies should create in the area of big data scientists. procedures and practices that provide an enabling envi- The implementation of Big Data in Ghana comesr ownmithen itt fso ro rwesnp ocnhsaiblllee ndgateas a naaplaytritc sf r(oCmom tmheo nowneeaslt h discussed above. Big Data has much potential foro fd Aeuvsetlroalpiam, 2e0n1t3. )B. ig Data for development has been defined by Global Pulse (2013), to mean the identification of sources of big data relevant to Science and Development 17 Volume 2, No. 1, 2018 Unstructured Social Media Data Data Structured Data For Data Cleaning Quality Data Clean Data Afful et al • Redefining the Concept of Big Data: A Ghanaian Perspective 50 Privacy and Security is an essential part of every society. Asangansi I, Braa K. (2010). The emergence of mobile- Individuals and organizations have data that they protect. supported national health information systems For anyone wishing to explore Big Data for development, in developing countries.  Stud Health Technol privacy and security is a primary concern, since it has Inf 2010;160(Pt 1):540–4.  [PubMed] implications for all areas of work, from data acquisition Commonwealth of Australia. (2013). Big Data Strategy and storage to retention, use and presentation (Global — Issues Paper. Available at: https://www. Pulse, 2013). Data anomalies are normally not detected finance.gov.au/files/2013/03/Big-Data-Strategy- at the early stages of data analysis and very often they are Issues-Paper1.pdf not discovered in real time. It is imperative to take note of the type of technology used in order to combat the Fosso Wamba, S., Akter, S., Edwards, A., Chopin, G., and anomalies. Gnanzou, D. (2015). “How ‘Big Data’ Can Make Big Impact: Findings from a Systematic Review and a Longitudinal Case Study,” International Conclusion Journal of Production Economics. 165, p.234-246 This paper aimed to identify the potential for Gangadharan J. (2014). 7 Ways to Leverage the Data development and use of big data in current information Goldmine with Big Data and Analytics. Global administration in Ghana. The adoption of such practices Practices, Domain Expertise, Customer Experience by various institutions or organizations and government Global Pulse, (2012). Big Data for Development: Chal- agencies in Ghana will enable these organizations to take lenges and Opportunities. Available at http:// full advantage of Big Data technologies. This will permit www.unglobalpulse.org/sites/default/files/Big- agencies to deliver better-quality and integrated services, DataforDevelopment-UNGlobalPulseJune2012. improve policy development and identify new services pdf and opportunities to make use of the national information Global Pulse. (2013). Big Data For Development: assets, that is, Ghana government data and other data A Primer Harnessing Big Data for Real- collected by the various agencies in the country. We have Time Awareness. Available at: http://www. reviewed some of the technologies currently being used unglobalpulse.org/sites/default/files/Primer%20 and proposed a functional definition of big data. We 2013_FINAL%20FOR%20PRINT.pdf conclude that if the government harnesses the potential of big data to analyse data sets that are generated by the Gobble M.M. (2013). Big Data: The Next Big Thing in different agencies in the Ghanaian economy, this could Innovation. Research-Technology Management, improve government operations, policy development Vol. 56, No. 1. p. 64-66 and service delivery for rapid economic development. Government of Ghana e-Services Portal. (2011). Avail- There is the need also to strengthen research institutions able at: http://www.eservices.gov.gh/SitePages/ to provide the leadership required to drive this effort. Portal-Home.aspx Hammond W.E, Bailey C, Boucher P, Spohr M, Whitaker P.  (2010). Connecting Information To Improve References Health.  Health Aff (Millwood)  2010. Feb Andrade P.L., Hemerly, J., Recalde G. and Ryan P.S. 1;29(2):284–8.[PubMed] (2014). From Big Data to Big Social and Kothai M. (2015). How to use big data to combat fraud. Economic Opportunities: Which Policies Will Big Data Science and Technology. World Economic Lead to Leveraging Data-Driven Innovation’s Forum. https://agenda.weforum.org/2015/01/ Potential? The Global Information Technology how-to-use-big-data-to-combat-fraud/ Report – World Economic Forum. p.81-86 Science and Development Volume 2, No. 1, 2018 Afful et al • Redefining the Concept of Big Data: A Ghanaian Perspective 51 Lewis T, Synowiec C, Lagomarsino G, Schweitzer Raghupathi W. (2010). Data Mining in Health Care. In J.  (2012) E-health in low- and middle-income Health care Informatics: Improving Efficiency and countries: Findings from the center for health Productivity, (Edited by Kudyba S.) Taylor & market innovations.  Bull World Health Organ; Francis p.211-223 90(5):332–40. [PMC free article]  [PubMed] Raghupathi W. and Raghupathi V. (2014). Big data ana- J. Manyika J., M. Chui M., B. Brown B., J. Bughin J., R. lytics in healthcare: promise and potential, Health Dobbds R., C. Roxburgh C. and A.H. Byers A.H. Information Science and Systems 2 (1) (2014), p. 3 (2011). Big data: The next frontier for innovation, Suresh K L. (2012). Managing the Big Data Challenge competition and productivity. McKinsey Global in Ghana’s Banking Sector. Building a Smarter Institute. Available at: (http://www.mckinsey. Planet. Available at http://asmarterplanet.com/ com/insights/business_technology/big_data_ blog/2012/11/21189.html the_next_frontier_for_innovation) Science and Development Volume 2, No. 1, 2018