Over the past few years I have written on occasion about the "AI Cold War" between the U.S. and China. Many people believe that control over AI technology, particularly as we get closer to human-level intelligence, will be a primary source of geopolitical power in coming decades. The prevailing wisdom has gone like this so far: Success in AI is dependent on data. China collects more data on citizens and has fewer laws about data privacy. Therefore, China will be at a data advantage and thus an AI advantage.
The argument makes logical sense if you think about it from a first-order perspective. But if you think about the other more complex factors that affect this situation, and things that are emerging in the data landscape, it starts to look like the challenges of less data could be a benefit in some cases.
First there is the political economy issue of free market capitalism vs state directed capitalism. Since one problem with ML/AI models is getting labeled data, in a market that is more open, we will see more entrepreneurs try to obtain labeled data sets that may not be obviously valuable. In fact, one of the things VCs look for is entrepreneurs who realize a "secret." We want to back founders who understand the the entire world is mostly missing something that they understand and can use to capitalize into a business. Many of these "secrets" will turn out to be wrong, but the few that are correct will lead to huge companies. And in a market where an entrepreneur can pursue a path of getting some labeled data that may not make sense initially to anyone else, might give that company, and the country it's based in, a boost down the road.
On the flip side, state directed capitalism might allow data sets to be captured that aren't allowed in other more democratic countries. That means use cases for ML can emerge in places like China that can't emerge in the U.S. Looking at this through a lens of probabilities around outcomes, I don't know how to handicap it more towards one country so I'd say it's a draw.
But now I want to point out something very important that one of my early ML advisors told me. He mentioned that one reason there may not be a lot of work on small data AI is that large companies, where most of the early AI work was done, have large data sets. They had no incentive to figure out small data AI.
I've been fortunate enough to be an investor in Synthesis AI for over a year now, and have had a front row seat to solving the lack of data problem. What I am seeing in the market is more and more companies using synthetic data, for many different reasons. One is economic - to fill out data sets that are expensive to get and label. One is time to market - to rapidly tweak data sets to improve model performance when the data may otherwise take a while to gather. Another use case is privacy - to generate images of humans who don't really exist to train models on faces, etc. The broader trend here is that synthetic data, and the companies who work in the space, are starting to fill the gap. Small data AI is possibly by using synthetic data to change small data sets into larger ones.
The question then, is does this negate some of the perceived benefits of China in the AI Cold War? Remember that a major argument for why China will win is that their political systems allow for the gathering of more types of data than we can gather in the U.S. If synthetic data can fill that gap, does the Chinese advantage diminish? I could argue an even more important point that mastering synthetic data, which the U.S. has more reason to do than China, is a huge benefit to AI long term because we can rapidly generate simulated data sets to train on as AI makes it way into more corners of the economy.
Perhaps what was perceived as a weakeness - stricter data privacy laws - has actually spawned innovation that has turned it into a strength.
What is so difficult in my work as a VC of betting on technology outcomes, it that these twists and turns aren't always obvious in the beginning. It's why sometimes we have to just place bets in areas where a lot is happening and hope that smart entrepreneurs can navigate to the sweet spot of the market.
To summarize my point though, I would just say that the race for AI leadership is far from over, and the criteria that may ultimately determine who wins may be things that haven't emerged yet, or vectors of technological competition we don't fully understand and realize are relevant. Much is happening in the space - from low levels of the technology stack like new AI chips up through data collection issues, synthetic data, and new applications and uses of machine learning. It may be too early to predict a winner, or even the winning criteria. The way to win may be by planting the most technological trees, and watching what grows most organically.
Thanks for reading.