№1, 2016

BIG PROSPECTS AND PROBLEMS OF BIG DATA TECHNOLOGY
Imamverdiyev Yadigar N.

Big Data covers technologies and tools for collecting, processing, analyzing and extracting useful knowledge from structured and unstructured data of large volumes generated at high speed by different sources. Recently, scientific and popular literature promotes Big Data as technology, which opens new perspectives and revolutionary changes in e-government, business, health, science, industry and other fields. In order to determine the true potential of arguments supporting these assertions and to choose the right strategy for Big Data, this paper critically examines essentials, characteristics, basic building components and analytical capabilities of Big Data, and identifies advantages, prospects and existing problems (pp.21-30).

Keywords:Big Data; Big Data analytics; Data Mining; Hadoop; predictive model
References
  • Manyika J., Chui M., Brown B., Bughin J., Dobbs R., Roxburgh C., Byers A.H. Big data: The next frontier for innovation, competition, and productivity. McKinsey Global Institute. 2011.
  • Baaziz A., Quoniam L. How to use Big Data technologies to optimize operations in Upstream Petroleum Industry / International Journal of Innovation, 2013, vol. 1, no. 1, pp. 19-29.
  • Feblowitz J. The Big Deal about Big Data in upstream oil and gas. IDC Energy Insights. October 2012.
  • Editorial: Community cleverness required // Nature, 4 September 2008, vol. 455, no. 7209, 1-1. doi:10.1038/455001a
  • Dean J., Ghemawat S. MapReduce: Simplified data processing on large clusters / Proc. of the 6th Conference on Symposium on Opearting Systems Design & Implementation (OSDI’04), 2004, vol. 6, pp. 137-150.
  • Han J., Kamber M., Jian P. Data mining: concepts and techniques. Morgan Kaufmann, 2006.
  • Bishop C.M. Pattern recognition and machine learning. 2006.
  • Feldman R., Sanger J. The Text Mining Handbook: Advanced approaches in analyzing unstructured data. Cambridge University Press, 2007.
  • Junqué de Fortuny E., Martens D., Provost F. Predictive modelling with Big Data: Is bigger really better? // Big Data, 2013, vol. 1, no. 4, pp. 215-226.
  • Weiss Sh. M., Indurkhya N., Zhang T., Damerau F. Text Mining: Predictive methods for analyzing unstructured information. Springer; 2005, 260 p.
  • Aliguliyev R.M. A new sentence similarity measure and sentence based extractive technique for automatic text summarization // Expert Systems with Applications, vol. 36, no. 4, 2009, pp. 7764–7772.
  • Alguliev R.M., Aliguliyev R.M., Isazade N.R. Multiple documents summarization based on evolutionary optimization algorithm // Expert Systems with Applications, 2013, vol. 40, no. 5, pp. 1675-1689.
  • Siegel E. Predictive Analytics: The power to predict who will click, buy, lie, or die. Wiley; 1st edition. 2013. 320 p.
  • Karthik K., Kollias G., Kumar V., Grama A. Trends in Big Data analytics / Journal of Parallel and Distributed Computing, 2014, vol. 74, no. 7, pp. 2561-2573.
  • White T. Hadoop: The definitive guide. O'Reilly Media, Inc., 2012.
  • Ghemawat S., Gobioff H., Leung S. The Google file system / Proc. of the 19th ACM Symposium on Operating Systems Principles, 2003, pp. 29-43.
  • Anglade T. noSQL Tapes. http://www.nosqltapes.com.
  • Stonebraker M., Madden S., Abadi D. J., Harizopoulos S., Hachem N., Helland P. End of an Architectural Era (It's Time for a Complete Rewrite) / Proc. of the 33rd International Conference on Very Large Data Bases (VLDB '07), 2007, pp. 1150-1160.
  • Agrawal D., Das S., El Abbadi A. Big data and cloud computing: current state and future opportunities / Proc. of the 14th International Conference on Extending Database Technology, 2011, pp. 530-533.
  • Shvachko K., Kuang H., Radia S., Chansler R. The Hadoop distributed file system / IEEE 26th Symposium on Mass Storage Systems and Technologies, 2010, pp. 1-10.
  • Lee K.H., Lee Y.J., Choi H., Chung Y.D., Moon B. Parallel data processing with MapReduce: a survey // ACM SIGMOD Record, 2012, vol. 40, no. 4, pp. 11-20.
  • Leinweber D., Stupid Data Miner tricks: Overfitting the S&P 500 // The Journal of Investing, 2007, vol. 16, no. 1, pp. 15-22.
  • Rowstron A., Narayanan D., Donnelly A., O’Shea G., Douglas A., Nobody ever got fired for using Hadoop on a cluster / Proc. of the Workshop on Hot Topics in Cloud Data Processing (HotCDP), 2012, Article No. 2. doi:10.1145/2169090.2169092
  • Ananthanarayanan G., Ghodsi A., Wang A., Borthakur D., Kandula S., Shenker S., Stoica I. PACMan: Coordinated memory caching for parallel jobs / Proc. of the 9th USENIX Conference on Networked Systems Design and Implementation, 2012, pp. 20.
  • Chen Y., Alspaugh S., Katz R.H. Interactive analytical processing in big data systems: A cross-industry study of mapreduce workloads // Proc. of the VLDB Endowment (PVLDB), 2012, vol. 5, no. 12, pp. 1802–1813.
  • Tene O., Polonetsky J. Privacy in the age of big data: A time for big decisions // Stanford Law Review Online, 2012. http://www.stanfordlawreview.org/online/privacy-paradox/big-data