In the latest chapter in the global debate over big data and artificial intelligence, Al Jazeera’s Peter Greste reports on the rise of a new class of big data software, Hive, which has become an important technology for companies to deploy and improve on.
The Guardian’s data science correspondent Nick Harris has been tracking Hive for months, and he’s been writing about it since early 2016.
“It’s really important for us to understand the underlying technology and how it works,” said Harris.
“We can use that information to make decisions that we would not normally make on our own, to build things that people would not think of.”
Hive, as Harris puts it, is “an intelligent machine learning system that takes a bunch of data and makes predictions based on that data”.
The idea behind Hive is that the machine learning algorithms built into Hive are “smart enough” to detect when certain patterns are being used by people in their social network, and to use those patterns to predict which person is most likely to be in a similar situation in the future.
That means it can make predictions about what might happen next, which is a key part of the problem, and the solution, Harris says.
“Hive is a machine learning algorithm that learns how to use different kinds of data, like Facebook posts, tweets, and photos, to make predictions based more on what’s going on in real life than what is in the model,” he told Al Jazeera.
“There’s this big problem, people are using Facebook to connect, to be seen and be connected.
The machine learning is able to make this connection, and it then uses that data to predict how much time they will have to spend in these different groups of people.”
In its current incarnation, Hive is “not particularly powerful” for large-scale applications, but it is able “to work on a scale of tens of thousands of people, or tens of millions, or even billions”.
Harris says that there are many ways that Hive can be used to help businesses, but he believes that it can be the next great breakthrough in data science.
“The way Hive is used in many cases is by using the information that people share to predict the outcomes of a particular event,” he said.
“So, for example, the information you might have shared with your partner on Facebook is used to predict that your partner is going to be a little bit older, and then when that happens you can make an algorithm that will predict that you’ll have a slightly younger partner.”
Harris also believes that Hive will help in a future when companies are looking to create new types of artificial intelligence (AI) that are smarter than human brains.
“We’ve seen this with self-driving cars and AI systems that are learning and then using that knowledge to predict when that car is going too fast,” he explained.
“In the future, you could imagine an AI system that is trained to understand human behaviour and to predict what the outcome of human behaviour is, but at the same time, it’s trained to be able to predict human behaviour, and that is the next step.”
Hives predictions are “quite complex” and “generally very accurate”, Harris says, and this is a huge leap in machine learning capabilities.
“This is not a prediction engine, it is a prediction system,” he added.
“I’m really excited about Hive, and I think it’s the next-generation AI system, the next generation AI that is really capable of understanding human behaviour.”
The biggest benefit of Hive, however, is that it allows for “big data” to be collected on a much larger scale.
“For example, if you had this data set, maybe 20 million people, and you wanted to learn a little more about that data, you can collect this data and build a prediction model for that person, and if you then ask it to build a model for another 20 million individuals, you have a much bigger data set to work with,” Harris said.
As Harris and his colleagues have been tracking the emergence of Hive and other new AI tools, the field has been growing exponentially in terms of the number of companies using them.
“There’s been this big exponential growth,” he noted.
“And so you have this whole space of big, massive data, but with the new tools, there’s also the possibility of building really powerful machine learning models for that data.
And that’s something that Hive has a huge potential to help with.”