Big Data Research 2 (2015) 59–64
Contents lists available at ScienceDirect
Big Data Research
www.elsevier.com/locate/bdr
Signiﬁcance and Challenges of Big Data Research ✩
Xiaolong Jin a,∗, Benjamin W. Wah a,b
, Xueqi Cheng a
, Yuanzhuo Wang a
a
CAS Key Lab of Network Data Science and Technology, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100190, China
b
The Chinese University of Hong Kong, Shatin, NT, Hong Kong, China
a r t i c l e i n f o a b s t r a c t
Article history:
Received 14 November 2014
Accepted 11 January 2015
Available online 26 February 2015
Keywords:
Big data
Data complexity
Computational complexity
System complexity
In recent years, the rapid development of Internet, Internet of Things, and Cloud Computing have led to
the explosive growth of data in almost every industry and business area. Big data has rapidly developed
into a hot topic that attracts extensive attention from academia, industry, and governments around the
world. In this position paper, we ﬁrst brieﬂy introduce the concept of big data, including its deﬁnition,
features, and value. We then identify from different perspectives the signiﬁcance and opportunities that
big data brings to us. Next, we present representative big data initiatives all over the world. We describe
the grand challenges (namely, data complexity, computational complexity, and system complexity), as
well as possible solutions to address these challenges. Finally, we conclude the paper by presenting
several suggestions on carrying out big data projects.
© 2015 Elsevier Inc. All rights reserved.
1. Introduction
In recent years, big data has rapidly developed into a hotspot
that attracts great attention from academia, industry, and even
governments around the world [1–3]. Nature and Science have
published special issues dedicated to discuss the opportunities and
challenges brought by big data [4,5]. McKinsey, the well-known
management and consulting ﬁrm, alleged that big data has penetrated
into every area of today’s industry and business functions
and has become an important factor in production [6]. Using and
mining big data heralds a new wave of productivity growth and
consumer impetus. O’Reilly Media even asserted that “the future
belongs to the companies and people that turn data into products”
[7]. Some even say that big data can be regarded the new
petroleum that will power the future information economy. In
short, the era of big data has already been in the oﬃng.
What is big data? So far, there is no universally accepted definition.
In Wikipedia, big data is deﬁned as “an all-encompassing
term for any collection of data sets so large and complex that it
becomes diﬃcult to process using traditional data processing applications”
[8]. From a macro perspective, big data can be regarded
as a bond that subtly connects and integrates the physical world,
the human society, and cyberspace. Here the physical world has a
reﬂection in cyberspace, embodied as big data, through Internet,
the Internet of Things, and other information technologies, while
✩
This article belongs to Visions on Big Data.
* Corresponding author.
E-mail address: jinxiaolong@ict.ac.cn (X. Jin).
human society generates its big data-based mapping in cyberspace
by means of mechanisms like human–computer interfaces, brain–
machine interfaces, and mobile Internet [9–11]. In this sense, big
data can basically be classiﬁed into two categories, namely, data
from the physical world, which is usually obtained through sensors,
scientiﬁc experiments and observations (such as biological
data, neural data, astronomical data, and remote sensing data), and
data from the human society, which is often acquired from such
sources or domains as social networks, Internet, health, ﬁnance,
economics, and transportation.
Compared to traditional data, the features of big data can be
characterized by 5V, namely, huge Volume, high Velocity, high
Variety, low Veracity, and high Value. The main diﬃculty in coping
with big data does not only lie in its huge volume, as we
may alleviate to some extent this issue by reasonably expanding
or extending our computing systems. Actually, the real challenges
center around the diversiﬁed data types (Variety), timely response
requirements (Velocity), and uncertainties in the data (Veracity).
Because of the diversiﬁed data types, an application often needs
to deal with not only traditional structured data, but also semistructured
or unstructured data (including text, images, video, and
voice). Timely responses are also challenging because there may
not be enough resources to collect, store, and process the big data
within a reasonable amount of time. Finally, distinguishing between
true and false or reliable and unreliable data is especially
challenging, even for the best data cleaning methods to eliminate
some inherent unpredictability of data.
Big data is of great value, which is beyond all doubt. From
the perspective of the information industry, big data is a strong
http://dx.doi.org/10.1016/j.bdr.2015.01.006
2214-5796/© 2015 Elsevier Inc. All rights reserved.
60 X. Jin et al. / Big Data Research 2 (2015) 59–64
impetus to the next generation of IT industry, which is essentially
built on the third platform, mainly referring to big data, cloud
computing, mobile Internet, and social business. IDC predicted that
by 2020 the market size of the third IT platform will reach US$
5.3 trillion; and from 2013 to 2020, 90% of the growth in the
IT industry would be driven by the third IT platform. From the
socio-economic point of view, big data is the core connotation and
critical support of the so-called second economy, a concept proposed
by the American economist W.B. Arthur in 2011 [12], which
refers to the economic activities running on processor, connectors,
sensors, and executors. It is estimated that by 2030 the size of the
second economy will approach that of the ﬁrst economy (namely,
the traditional physical economy). The main support of the second
economy is big data, as it is an inexhaustible and constantly
enriching resource. In the future, by virtue of big data, the competence
under the second economy will no longer be that of labor
productivity but of knowledge productivity.
2. Signiﬁcance of big data
Due to its great value, big data has been essentially changing
and transforming the way we live, work, and think [1]. In what
follows, we describe in detail the signiﬁcance of big data in various
perspectives.
2.1. Signiﬁcance to national development
At present, the world has completely entered the era of the information
age. The extensive use of Internet, Internet of Things,
Cloud Computing, and other emerging IT technologies has made
various data sources increasing at an unprecedented rate, while
making the structures and types of data increasingly complex.
Depth analysis and utilization of big data will play an important
role in promoting sustained economic growth of countries and enhance
the competitiveness of companies.
In the future, big data will become a new point of economic
growth. With big data, companies will upgrade and transform to
the mode of Analysis as a Service (AaaS), thereby changing the
ecology of the IT and other industries. In this context, the global
giants of the IT industry (such as IBM, Google, Microsoft, and Oracle)
have already begun their technical development planning in
the big data era.
At the national level, the capacity of accumulating, processing,
and utilizing vast amounts of data will become a new landmark
of a country’s strength. The data sovereignty of a country in cyberspace
will be another great power-game space besides land, sea,
air, and outer spaces.
In China, a government report has clearly proposed that cyberspace,
as well as deep sea and deep space, are key areas of
the national core interests. The lag behind in the ﬁeld of big data
research and applications not only means the loss of its industrial
strategic advantage, but also suggests loopholes in its national
security cyberspace. In this sense, the Big Data Research and Development
Initiative1
[13], announced by the United States in March
2012, is not only a strategic plan that promotes the US to continuously
lead in the high-tech ﬁelds, but also a plan to protect its
national security and advance its socio-economic development.
In general, the Western countries, represented by the United
States, are moving under their national agenda towards a modernization
of their national strength through big data research and
applications. It is anticipated that future economic and political
competitions among countries will be based on exploiting the potential
of big data, among other traditional aspects. In short, the
1
http://www.whitehouse.gov/sites/default/ﬁles/microsites/ostp/big_data_press_
release_ﬁnal_2.pdf.
research and applications of big data are of strategic importance
and signiﬁcance for improving the competitiveness of any country.
2.2. Signiﬁcance to industrial upgrades
Big data is currently a common problem faced by many industries,
and it brings grand challenges to these industries’ digitization
and informationization. Research on common problems of big data,
especially on breakthroughs of core technologies, will enable industries
to harness the complexity induced by data interconnection
and to master uncertainties caused by redundancy and/or shortage
of data. Everyone hopes to mine from big data demand-driven information,
knowledge and even intelligence and ultimately taking
full advantage of the big value of big data. This means that data is
no longer a byproduct of the industrial sector, but has become a
key nexus of all aspects. In this sense, the study of common problems
and core technologies of big data will be the focus of the new
generation of IT and its applications. It will not only be the new
engine to sustain the high growth of the information industry, but
also the new tool for industries to improve their competitiveness.
For example, in recent years, cloud computing has rapidly
evolved from a vague concept in the beginning to a mature
hot technology. Many big companies, including Google, Microsoft,
Amazon, Facebook, Alibaba,2
Baidu,3
Tencent,4
and other IT giants,
are working on cloud computing technologies and cloud-based
computing services. Big data and cloud computing is seen as two
sides of a coin: big data is a killer application of cloud computing,
whereas cloud computing provides the IT infrastructure to big
data. The tightly coupled big data and cloud computing nexus are
expected to change the ecosystem of Internet, and even affect the
pattern of the entire information industry.
2.3. Signiﬁcance to scientiﬁc research
Big data has caused the scientiﬁc community to re-examine its
methodology of scientiﬁc research [14] and has triggered a revolution
in scientiﬁc thinking and methods.
It is well-known that the earliest scientiﬁc research in human
history was based on experiments. Later on, theoretical science
emerged, which was characterized by the study of various laws
and theorems. However, because theoretical analysis is too complex
and not feasible for solving practical problems, people began
to seek simulation-based methods, which led to computational sci-
ence.
The emergence of big data has spawned a new research
paradigm; that is, with big data, researchers may only need to ﬁnd
or mine from it the required information, knowledge and intelligence.
They even do not need to directly access the objects to be
studied. In 2007, the late Turing Award winner, Jim Gray, depicted
in his last speech the fourth paradigm of data-intensive scientiﬁc
research [14], which separates data-intensive science from computational
science. Gray believed that the fourth paradigm may
be the only systemic way for solving some of the toughest global
challenges we face today. In essence, the fourth paradigm is not
only a change in the way of scientiﬁc research, but also a change
in the way that people think [1].
2.4. Signiﬁcance to emerging interdisciplinary research
Big data technologies and the corresponding fundamental research
have become a research focus in academia. An emerging
2
http://www.alibaba.com/.
3
http://www.baidu.com/.
4
http://www.tencent.com/.
X. Jin et al. / Big Data Research 2 (2015) 59–64 61
interdisciplinary discipline called data science [15] has been gradually
coming into place. This takes big data as its research object
and aims at generalizing the extraction of knowledge from data.
It spans across many disciplines, including information science,
mathematics, social science, network science, system science, psychology,
and economics [16,7]. It employs various techniques and
theories from many ﬁelds, including signal processing, probability
theory, machine learning, statistical learning, computer programming,
data engineering, pattern recognition, visualization, uncertainty
modeling, data warehousing, and high performance comput-
ing.
Many research centers/institutes on big data have been established
in recent years in different universities throughout the
world (such as the University of California at Berkeley, Columbia
University, New York University, Tsinghua University, Eindhoven
University of Technology, and Chinese University of Hong Kong).
Lots of universities and research institutes have even set up undergraduate
and/or postgraduate courses on data analytics for cultivating
talents, including data scientists and data engineers.
2.5. Signiﬁcance to helping people better perceive the present
Big Data, especially big networked data, contains a wealth of
societal information and can thus be viewed as a network mapped
to society. To this end, analyzing big data and further summarizing
and ﬁnding clues and laws it implicitly contains can help us better
perceive the present.
For instance, two example indices of interest developed in
China make great use of data publicly available from the Internet.
Since 2007, China Survey and Assessment Center, aﬃliated to
Renmin University of China, has issued annual “China Development
Index.” This index, with four individual indices on health, education,
living standard, and social environment, intends to measure
the status quo and unscramble the problems of China’s development.
It provides a scientiﬁc basis for a reasonable measure on the
overall development of China. As another effort, since 2010, Xinhua
News Agency, together with Dow Jones Newswires, published
twice a year “Xinhua-Dow Jones International Financial Centers
Development Index.” By comparing and analyzing various subjective
and objective indicators and by combining qualitative and
quantitative analysis, this index reveals the current development
status and laws of international ﬁnancial centers.
Deep mining information contained in big data can also help
people make better decisions. For example, in the presidential
election of the United States in November 2012, Barack Obama’s
campaign team helped Obama by analyzing big data in order to
beat Romney and to get re-elected.5
In the eighteen months before
Election Day, Obama’s data analysis team created a huge data
processing system. Through real-time data collection and analysis,
not only could it tell the campaign team how to ﬁnd voters and
to get their attention, but it also analyzed the tendency for voters
to vote. Every night, the data analysis team conducted simulation
on the election and presented simulation results in the next day
to help understand the possibility that Obama might win in some
areas, based on which the team can allocate resources more precisely.
Later facts demonstrated that the data analysis team played
a crucial role in Obama’s re-election, far beyond people’s imagina-
tion.
Analyzing and mining big data can also effectively safeguard
public security and combat criminal and economic crimes. For example,
in 2012 big data analysis played a major role in uncovering
the criminal case of Zhou Kehua,6
a notorious serial killer and rob-
5
http://www.technologyreview.com/featuredstory/508836/
how-obama-used-big-data-to-rally-voters-part-1/.
6
http://en.wikipedia.org/wiki/Zhou_Kehua.
ber in China who died in a shootout with police. Since the series
of armed robberies and homicides where Zhou was a suspect, police
conducted a comprehensive examination of a massive variety
of video data and successfully obtained a life video where the suspect
bought breakfast without any camouﬂage. Upon this ﬁnding,
they tracked Zhou to the Internet cafe where he regularly visited
and successfully acquired two clear mug-shots of the suspect when
he accessed the Internet. According to the observation that he preferred
browsing Web sites related to Sichuan and Chongqing of
China, police identiﬁed that the suspect was from the area with a
Sichuan dialect. Based on the summary analysis on various information
obtained from big data, police put together the suspect’s
characteristics and actions when committing the crimes. The analysis
played a decisive role in helping police deploy their forces and
eventually capture Zhou.
2.6. Signiﬁcance to helping people better predict the future
Through effective integration and accurate analysis on multisource
heterogeneous big data, better predictions of future trends
of events can be achieved. It is possible for big data analysis to
even promote sustainable developments of society and economy
and further give birth to new industries related to data services.
The ability of big network data has been being highly developed
and effectively applied in the ﬁeld of security and military.
As an example, as early as in 2010, the United States released a
report entitled “Chinese Nuclear Warhead Storage and Handling
System” [17], which claimed that the US found nuclear bases of
China in areas like Shaanxi, Jiangxi, and Sichuan. The report even
presented the names of cities and counties where the nuclear
bases were located. This reports caused a sensation at a global
scale. Through this report, the 2049 Project Institute7
of the United
States got into public’s attention. Founded in Washington, DC, in
2008, this institute makes use of publicly available data and documents
(such as journals and conference papers) to analyze and
predict security issues in China related to its military and economy.
They completed the report through vertical searches, elaborated
analysis, and systematic analysis of big data. In March 2013,
the Institute also released a research report on China’s Unmanned
Aerial Vehicle (UAV) project [18], which conducted a comprehensive
analysis on the research, development, equipment, and operational
deployment of UAV in China. They also hypothesized that
in the future China’s UAV will be able to locate, track and target
US aircraft carriers in support of long range anti-ship cruise and
ballistic-missile strikes [18].
Big data-based predictive analysis has been applied to address
societal issues, including public health and economic development.
Ginsberg, et al. found that, if the volume of queries submitted
to Google and with keywords like “ﬂu symptom” and “ﬂu treatment”
increase in a region, then after a few weeks, the number
of inﬂuenza patients to the emergency rooms of hospitals in the
corresponding area will increase accordingly [19]. With this discovery,
they will be able to predict outbreaks of inﬂuenza and
deploy countermeasures in advance. On economic development,
the United Nations recently launched a new project, called Global
Pulse [20], which expects to use big data to promote the development
of global economy. The United Nations will conduct the
so-called emotional analysis, which makes use of natural-languageprocessing
software to analyze text messages in social networking
sites in order to predict societal issues like unemployment rate,
spending cuts and disease outbreaks in a given region. Its overall
goal is to utilize digital early warning signals to guide assistance
7
http://project2049.net/.
62 X. Jin et al. / Big Data Research 2 (2015) 59–64
projects in advance in order to prevent an area from re-falling into
the plight of poverty.
3. International initiatives on big data
Because of the great signiﬁcance and value of big data, many
countries have launched their plans or initiatives on big data related
research and applications. In this section, we brieﬂy overview
these efforts.
As mentioned in the previous section, in March 2012, the
Obama Administration oﬃcially launched the Big Data Research
and Development Initiative with an investment of more than US$
200 million [13]. The initiative involves six federal government
agencies, namely, the Department of Defense (DoD), Defense Advanced
Research Projects Agency (DARPA), Department of Energy
(DoE), National Institutes of Health (NIH), National Science Foundation
(NSF), and US Geological Survey (USGS) [13].
The initiative aims to study new infrastructures and methodologies
for big data research in order to greatly facilitate the tools
and techniques for acquiring knowledge and insights from big data,
while improving the ability to use big data for scientiﬁc discovery.
It speciﬁcally is intended to develop core technologies to collect,
store, manage, analyze and share large-scale data, and use these
technologies to accelerate the pace of discovery in science and
engineering, strengthen national security, completely change the
education and learning mode, and vigorously cultivate new talents
for developing and using big data technologies. It also prepares
the next generation of data scientists and engineers and particularly
seeks a 100-fold increase in the ability of analysts to extract
information from texts in any language. The initiative engages not
only the government, but industry, academia and non-proﬁt organizations
together to take full advantage of the opportunities
created by big data, exploits its tremendous potential, and drives
the upgrade of industries. In particular, it focuses on the following
application areas: health and well-being, environment and sustainability,
emergency response and disaster resiliency, manufacturing,
robotics and smart systems, secure cyberspace, transportation and
energy, education, and workforce development. This is the second
national-level initiative in the ﬁeld of information technology, after
the “information highway” program in September 1993.
Besides the United States, Britain, France, Australia, and Japan
have also introduced their big data initiatives.
In January 2013, the British government announced a big-data
plan of £189 million. On one hand, the plan aims to push new
opportunities for using big data in commercial enterprises and research
institutions. It further supports with capital and policies the
development of big data in medical, agricultural, commercial, academic
research and other areas.
In February 2013, the French government published the “Digital
Roadmap,”8
which invested €11.5 million to support the development
of seven future projects, including big data.
In August 2013, the Australian federal government announced
the Australian Public Service Big Data strategy. It intends to promote
the service reformation of public sectors by making use of
big data analysis, developing better public policies and protecting
citizen privacy in order to make Australia among the world’s most
advanced in the big data ﬁeld.
The Japanese government announced their national big data
strategies, “The Integrated ICT Strategy for 2020” and “Declaration
to be the World’s Most Advanced IT Nation” [21], in 2012
and 2013, respectively. They plan to develop Japan’s new national
IT strategy with open public data and big data as its core during
2013–2020, and ﬁnally promote Japan as a country with the
8
http://www.ambafrance-ca.org/Digital-roadmap.
world’s highest standards in the extensive use of big data in the
information technology industry.
Finally, the European Commission announced Horizon 20209
as
their next framework program for research and innovation, which
invests about €120 million on big data-related industrial research
and applications. The program deﬁnes a research and innovation
strategy to guide a successful implementation of big data economy,
including excellent science, industrial leadership, and societal
challenges. In Horizon 2020, ICT 15 and 16 mainly address industrial
research on big data. Speciﬁcally, the former focuses on open
data innovation, whereas the latter focuses on big data research,
including technologies, benchmarks, and support actions (like com-
petitions).
4. Grand challenges of big data
There are many challenges in harnessing the potential of big
data today, ranging from the design of processing systems at the
lower layer to analysis means at the higher layer, as well as a series
of open problems in scientiﬁc research. Among these challenges,
some are caused by the characteristics of big data, some, by its
current analysis models and methods, and some, by the limitations
of current data processing systems. In this section, we brieﬂy
describe the major issues and challenges.
4.1. Data complexity
The emergence of big data has provided us with unprecedented
large-scale samples when dealing with computational problems,
although we now have to face far more complex data objects.
As aforementioned, the typical characteristics of big data are diversiﬁed
types and patterns, complicated inter-relationships, and
greatly varied data quality. The inherent complexity of big data (including
complex types, complex structures, and complex patterns)
makes its perception, representation, understanding and computation
far more challenging and results in sharp increases in the
computational complexity when compared to traditional computing
models based on total data. Traditional data analysis and mining
tasks, such as retrieval, topic discovery, semantic analysis, and
sentiment analysis, become extremely diﬃcult when using big
data. At present, we do not have a good understanding on addressing
the complexity of big data. For instance, we lack knowledge
regarding the laws of distribution and association relationship of
big data. We lack deep understanding on the inherent relationship
between data complexity and computational complexity of big
data, as well as domain-oriented big data processing methods. All
these greatly conﬁne our capacity to design highly eﬃcient computational
models and methods for solving problems using big data.
A fundamental problem is how to formulate or quantitatively
describe the essential characteristics of the complexity of big data.
The study on complexity theory of big data will help understand
essential characteristics and formation of complex patterns in big
data, simplify its representation, get better knowledge abstraction,
and guide the design of computing models and algorithms on big
data. To do this, we will need to establish the theory and models
of data distribution under multi-modal interrelationships. We
will also need to sort out intrinsic connections between data complexity
and spatio-temporal computational complexity. Moreover,
by modeling and analyzing the intrinsic mechanisms of data complexity,
we will be able to expound the principles and mechanisms
for processing big data into a solid foundation for big data com-
puting.
9
http://ec.europa.eu/programmes/horizon2020/.
X. Jin et al. / Big Data Research 2 (2015) 59–64 63
4.2. Computational complexity
Three of the key features of big data, namely, multi-sources,
huge volume, and fast-changing, make it diﬃcult for traditional
computing methods (such as machine learning, information retrieval,
and data mining) to effectively support the processing,
analysis and computation of big data. Such computations cannot
simply rely on past statistics, analysis tools, and iterative algorithms
used in traditional approaches for handling small amounts
of data. New approaches will need to break away from assumptions
made in traditional computations based on independent and
identical distribution of data and adequate sampling for generating
reliable statistics. When solving problems involving big data, we
will need to re-examine and investigate its computability, computational
complexity, and algorithms.
New approaches for big data computing will need to address
big data-oriented, novel and highly eﬃcient computing paradigms,
provide innovative methods for processing and analyzing big data,
and support value-driven applications in speciﬁed domains. New
features in big data processing, such as insuﬃcient samples, open
and uncertain data relationships, and unbalanced distribution of
value density, not only provide great opportunities, but also pose
grand challenges, to studying the computability of big data and the
development of new computing paradigms.
To address the computational complexity of big data applications,
we will need to focus on the whole life cycle of big data
applications in order to study data-centric computing paradigms
based on the characteristics of big data. We need to break away
from traditional computing-centric paradigms and establish datacentric
push-style computing paradigms and explore weak CAP
network shared-data system model and its algebraic computational
theory. We will need to develop algorithms for distributed
and streaming computing and form a big data oriented computing
framework where communication, storage, and computing
are well integrated and optimized. We will have to study
non-deterministic algorithmic theory suitable for big data and depart
from the independent-and-identically-distributed assumption
made in traditional statistical learning. We also need to explore
existing reduction-based computing methods where big data is reduced
on demand from being large enough to being just enough,
and to being valuable enough. Finally, we will need to develop
bootstrapping and sampling based local computation and approximation
methods and propose novel theoretical basis for big data
algorithms that are scalable to handling large amounts of data.
4.3. System complexity
Big data processing systems suitable for handling a diversity of
data types and applications are the key to supporting scientiﬁc research
of big data. For data of huge volume, complex structure, and
sparse value, its processing is confronted by high computational
complexity, long duty cycle, and real-time requirements. These requirements
not only pose new challenges to the design of system
architectures, computing frameworks, and processing systems, but
also impose stringent constraints on their operational eﬃciency
and energy consumption.
The design of system architectures, computing frameworks, processing
modes, and benchmarks for highly energy-eﬃcient big data
processing platforms is the key issue to be addressed in system
complexity. Solving these problems can lay the principles for designing,
implementing, testing, and optimizing big data processing
systems. Their solutions will form an important foundation for developing
hardware and software system architectures with energyoptimized
and eﬃcient distributed storage and processing.
The evaluation and optimization of energy eﬃciency of big
data processing systems is a great research challenge. Not only
do we need to untangle the relationship between complexity and
computability of big data applications and between eﬃciency and
energy consumption of processing systems, we will also need to
comprehensively measure a variety of energy eﬃciency factors,
including system throughput, parallel processing capabilities, job
calculation accuracy, and energy consumption per unit. We also
have to take actual workload conditions and scattered and repetitive
resources into account. We will need to conduct fundamental
research on performance evaluation, distributed system architecture,
streaming computing framework, and online data processing,
while taking into account features of value sparsity and weak access
locality and the life cycle of big data applications. We will
need to investigate validation tools, including benchmarks and system
performance prediction methods. Through an iterative process
of design, implementation, and validation, we will be able to develop
big data processing systems with a high data acquisition
throughput, low energy consumption, and highly eﬃcient comput-
ing.
5. Conclusions
Big data has made a strong impact in almost every sector and
industry today. In this paper, we have brieﬂy reviewed the opportunities
and signiﬁcance of big data, as well as some grand
challenges that big data brings us. We close by a few suggestions
on how to make a big data project successful.
It is no secret that in big data research and applications, industry
is ahead of academia. For example, according to the ﬁgure
Alibaba disclosed in March 2014, their data center has stored more
than 100 PB of processed data, which amounts to 100 million highresolution
movies. During the just past “Singles’ Day” (also known
as “Double 11 Day”), Alibaba pulled in CNY 9.3 billion in sales from
this shopping event, which corresponded to around 278 million orders.
For this annual shopping event, Alibaba developed a real-time
data processing platform called Galaxy, which can handle 5 million
transactions per second. The total amount of data that Galaxy can
process every day is about 2 PB. Industry is more successful in this
respect because it has two essential driving forces: they really need
to possess big data in real time and they have the requirements on
making better use of the data collected.
The successful applications of big data in industry point to the
following necessary conditions for a big data project to be successful.
Firstly, there must be very clear requirements, regardless
of whether they are technical, social, or economic. Secondly, to
eﬃciently work with big data, we will need to explore and ﬁnd
the kernel structure or kernel data to be processed. Finding kernel
data and structures, which are small enough and yet can characterize
the behavior and properties of the underlying big data, is
non-trivial because it is very domain-speciﬁc. Thirdly, a top-down
management model should be adopted. Although a bottom-up approach
may allow us to solve some niche problems, the isolated
solutions often cannot be put together into a complete solution.
Finally, the goal should be to solve the entire problem by an integrated
solution, rather than striving for isolated successes in a
few aspects. In short, an integrated engineering approach should
be employed in managing a big-data project.
Acknowledgements
This work is supported by the National Key Basic Research Program
of China (973 Program) (Nos. 2012CB316303 and
2014CB340401), National High-Tech R&D Program of China (863
Program) (No. 2014AA015204), National Natural Science Foundation
of China (Nos. 61232010, 61173008, and 61173064), and
the Seventh Framework Programme (FP7) of the European Union
(No. PIRSES-GA-2012-318939).
64 X. Jin et al. / Big Data Research 2 (2015) 59–64
References
[1] V. Mayer-Schonberger, K. Cukier, Big Data: A Revolution That Will Transform
How We Live, Work, and Think, Houghton Miﬄin Harcourt, 2013.
[2] R. Thomson, C. Lebiere, S. Bennati, Human, model and machine: a complementary
approach to big data, in: Proceedings of the 2014 Workshop on Human
Centered Big Data Research, HCBDR ’14, 2014.
[3] A. Cuzzocrea, Privacy and security of big data: current challenges and future
research perspectives, in: Proceedings of the First International Workshop on
Privacy and Security of Big Data, PSBD ’14, 2014.
[4] Big data, Nature 455 (7209) (2008) 1–136.
[5] Dealing with data, Science 331 (6018) (2011) 639–806.
[6] J. Manyika, M. Chui, B. Brown, J. Bughin, R. Dobbs, C. Roxburgh, A. Hung,
Big data: the next frontier for innovation, competition, and productivity, Tech.
rep., McKinsey Global Institute, 2011, available at: http://www.mckinsey.com/
insights/business_technology/big_data_the_next_frontier_for_innovation.
[7] C. O’Neil, R. Schutt, Doing Data Science: Straight Talk from the Frontline,
O’Reilly Media, Inc., 2013.
[8] Big data, http://en.wikipedia.org/wiki/Big_data, 2014.
[9] G. Li, X. Cheng, Research status and scientiﬁc thinking of big data, Bull. Chin.
Acad. Sci. 27 (6) (2012) 647–657.
[10] Y. Wang, X. Jin Xueqi, Network big data: present and future, Chinese J. Comput.
36 (6) (2013) 1125–1138.
[11] X.-Q. Cheng, X. Jin, Y. Wang, J. Guo, T. Zhang, G. Li, Survey on big data system
and analytic technology, J. Softw. 25 (9) (2014) 1889–1908.
[12] W.B. Arthur, The second economy, available at: http://www.images-et-reseaux.
com/sites/default/ﬁles/medias/blog/2011/12/the-2nd-economy.pdf, 2011.
[13] T. Kalil, Big data is a big deal, available at: http://www.whitehouse.gov/blog/
2012/03/29/big-data-big-deal, 2012.
[14] T. Hey, S. Tansley, K. Tolle (Eds.), The Fourth Paradigm: Data-Intensive Scientiﬁc
Discovery, Microsoft Corporation, 2009.
[15] Data science, http://en.wikipedia.org/wiki/Data_science, 2014.
[16] M. Loukides, What Is Data Science?, O’Reilly Media, Inc., 2011.
[17] M.A. Stokes, China’s nuclear warhead storage and handling system, Tech. rep.,
2049 Project Institute, March 2010.
[18] I.M. Easton, L.R. Hsiao, The Chinese people’s liberation army’s unmanned aerial
vehicle project: organizational capacities and operational capabilities, Tech.
rep., 2049 Project Institute, March 2013.
[19] J. Ginsberg, M.H. Mohebbi, R.S. Patel, L. Brammer, M.S. Smolinski, L. Brilliant,
Detecting inﬂuenza epidemics using search engine query data, Nature 7232
(2009) 1012–1014.
[20] Big data for development: challenges & opportunities, available at: http://
www.unglobalpulse.org/projects/BigDataforDevelopment, May 2012.
[21] Declaration to be the world’s most advanced IT nation, available at: http://
japan.kantei.go.jp/policy/it/2013/0614_declaration.pdf, June 2013.