Big Data costs time and money to acquire, store, process and interpret, so it has to provide important insights to be worth the effort. Analytics is the field that discovers meaningful patterns in data and presents them for human consumption. Even many large companies lack the hardware and technical expertise to do Big Data analysis within their own corporations. This creates a business opportunity for data handling companies to provide those services, so that each corporation only has to provide its data and questions that they want insight about. One such provider is Google, which has expanded its massive farms of digital storage and high-speed internet connectivity to create massive cloud storage for Big Data users.
Similarly, IBM has created a business called Predictive Analytics that rents out the tools and expertise to analyze data for other corporations.
Here is what IBM says Predictive Analytics does:
Predictive analytics brings together advanced analytics capabilities spanning ad-hoc statistical analysis, predictive modeling, data mining, text analytics, entity analytics, optimization, real-time scoring, machine learning and more.
Notice that these are details of the analytical processes we discussed above. Each is a career area, and to do it all within a company would require a large and expensive effort. So IBM and Google and other data handling organizations provide a variety of tools and analyses. Big Data analysis is becoming a commodity.
Traditionally, there are three ways to use Big Data. First is for a company to develop in-house the analytical capabilities (people, expertise, hardware, software) to do it all, or the second way is to hire a Google, IBM or other contractor to provide the tools and expertise necessary to investigate the company’s unique problems.
A third approach that is rapidly evolving is for teams of experts to create standard Big Data analysis tools to solve well-defined problems that many institutions face. Practical tools are available now for various businesses. One example is the package Predictive Analytics for Student Performance created by IBM. Using assessment scores, demographic information, survey responses, attendance records, and other data for each student, IBM will sell a school district or entire state education system predictions of which students will do well or poorly on upcoming assessments, graduation rates, AP classes, etc. IBM’s Predictive Analytics software will recommend specific interventions for individual students in order to improve their performance. Such personalized education is driven by the goal to graduate students with higher chances of success in life, and is driven by educational funding that is increasingly dependent on student performance scores. Salaries and bonuses of teachers, principals and superintendents are also commonly tied to improved results.
Here is an introduction to some analysis tools that all Big Data workers use.
To see Big Data analysis tools applied to epidemiology click here.
Some high school students who are already proficient programmers and might want to learn more about the coding aspects of Big Data can take relevant (mostly free) online courses (MOOCs) from a variety of sources – here is the link to recent Big Data courses offered through Coursera.