As both projections and expectations for big data accelerate, enterprise data groups find themselves working in a rapidly evolving arena both encouraging in its possibilities and vexing in its limitations. In 2018, big data will continue along both lines — offering more options with greater accessibility while frustrating enterprises looking for all-encompassing answers to complex questions. For those beginning to participate in the big data boom and those already fully involved, we offer:
Four reasons to be excited:
- Machine-learning methods become more accessible
- Data will not be in short supply
- Big data tools reach more effectively into the enterprise
- Infrastructure rises to support big data volume and velocity
Four reasons to be worried:
- Necessary skills are in critically short supply
- Privacy concerns become actionable
- Data interoperability remains limited
- Security flaws threaten data integrity
Excited:
Machine-learning methods become more accessible
The emergence of production-ready machine-learning tools and models continue to be a reason to be excited about big data in 2018. Machine-learning models can accurately perform recognition of specific patterns in data streams. In environments already inundated with data, this capability provides high value and distinct advantages, and the industry has responded accordingly.
Data scientists can take advantage of a growing number of open-source machine-learning frameworks including Google’s TensorFlow, Apache MXNet, Facebook Caffe2, and Microsoft Cognitive Toolkit, among others. Most important, the task of building models has never been easier. For example, Amazon Web Services (AWS) offers deep learning AMIs (Amazon Machine Images) with the leading ML frameworks already built in and ready for use on the AWS cloud. For those just starting, Google’s TensorFlow Playground helps users learn more about the neural networks underlying machine learning frameworks, using simple data sets and pre-trained models (Figure 1).
Figure 1. TensorFlow Playground offers an interactive sandbox for exploring the foundations of TensorFlow. (Source: Google)
Even without taking a deep dive into the inner workings of machine learning algorithms, developers can begin to apply these techniques to data sets. Google TensorFlow offers pre-trained models and examples as well as TensorFlow frameworks and walk-throughs for applications such as natural language processing, audio recognition, and image recognition, among others.
Using machine learning has even become simpler for more experienced users. Introduced by Facebook and Microsoft, Open Neural Network Exchange (ONNX) format provides a standard for moving models between ML frameworks. Besides early support for Caffee2 and Cognitive Toolkit by these companies, Amazon recently joined the effort with an open-source Python package for importing ONNX models into Apache MXNet.