[Analytics] How China’s reliance on US-origin platforms for deep learning challenged country’s AI

Dependence on US frameworks for deep learning seen as significant gap in China’s AI ecosystem, potentially hampering efforts to close the AI tech gap with the US by 2030. Minghe Hu, Zen Soo specially for the South China Morning Post.

When engineer Kuang Kaiming was assigned to a team developing artificial intelligence (AI) technology for a Shanghai start-up, the company went with two leading open-source software libraries, Google’s TensorFlow and Facebook’s Pytorch. The decision to adopt US core technology over Chinese alternatives was telling of China’s weakness in basic AI infrastructure, despite the country’s success in producing AI companies that are commercially successful.

Kuang’s company, whose AI product detects abnormalities in X-rays, is by no means alone. Nearly all small- to mid-sized Chinese AI companies rely on the US-originated open-source platforms, which also include MXNet and Caffe, because building an in-house framework from scratch requires a large investment of time and dedicated resources, as well as top-tier talent, to ensure the framework runs smoothly and covers a variety of use-cases.

Established open-source platforms like TensorFlow and Pytorch offer a host of tools and libraries designed for machine learning and deep learning, techniques that teach computers to learn by example.
Essentially, they democratise deep learning, allowing almost anyone to feed data into these models and start training their own AI systems without having to create their own from scratch.

China’s AI national champion, search giant Baidu, introduced its PaddlePaddle open-source AI platform in 2016, only a year after TensorFlow was launched, but it failed to gain traction among global AI programmers.

“Using PaddlePaddle is like buying a smartphone from a lesser-known brand which offers less features,” said Kuang, who joined Diannei Bio-Technology in August as part of its medical tech team. In that situation, “even buying a phone case or accessories like a charging cable is difficult”.

A Baidu spokesman declined to comment on the topic.

China’s reliance on US-originated frameworks constitutes a significant gap in the AI ecosystem, which comprises foundational technologies like algorithms and frameworks, as well as data, semiconductors and computing power.

“China clearly wants to lead the world when it comes to AI and it’s hard to imagine China being seen as a world leader if the open-source frameworks are so US-dominated,” said Helen Toner, director of strategy at Georgetown University’s Centre for Security and Emerging Technology.

Since the platforms and tool kits are open-source, developers who use them often give something back, writing code, fixing bugs or taking part in community discussions, thereby making the software better and stronger than before.

“Open-source frameworks work on a winner-takes-all basis, and since so many are already using TensorFlow and Pytorch and contributing to it, it’s better to use them to execute commercial applications,” said Kuang, who lauded TensorFlow for its modular capabilities that allow functions to be added on like building blocks.

Baidu’s PaddlePaddle remains relatively small. On Github, a code-hosting platform, it has only 264 direct contributors, whereas TensorFlow and Pytorch have over 2,000 and 1,000 respectively.

It is not only small Chinese AI start-ups like Diannei Bio-Technology that use TensorFlow. Chinese tech giants like JD.com, China Mobile, Meituan and Sogou have adopted the platform’s technology stack for various deep-learning purposes, according to a list of users on its website.

While PaddlePaddle notes that Huawei, Nvidia and Intel are users of its technology, there are markedly fewer international firms on its user list.

To be sure, there are industry insiders who believe there is no need for China to reinvent the wheel, given that platforms like TensorFlow and Pytorch are open-source and free to use. Fears that the US government could cut China off from using platforms like TensorFlow and Pytorch are unfounded, they say, since US export restrictions do not apply to open-source software.

“TensorFlow and Pytorch are just open platforms, it’s not right to say this is US technology … everyone around the world contributes to it … anyone can use it as long as they abide by the licensing terms,” said Tony Han, chief executive of autonomous driving company WeRide and a former associate professor at the University of Missouri, where he specialised in deep learning and computer vision technology.

“Why should we reinvent the wheel, when we can spend our precious time on more challenging and urgent problems?” he said. “For both academia and the industry, if you want to do something great, you have to form an international team, draw talent from all over the world and collaborate.

“Whoever starts restricting their technology will be the one who gets left behind.”

The debate comes amid a protracted US-China trade war and Washington’s increasing suspicions about China’s technology ambitions, especially in industries with national security implications such as 5G and AI.

As part of its goal to become a global leader in AI, the Chinese government has declared certain companies “national champions” in the technology, including Baidu, SenseTime, Megvii and Hikvision, tasked with spearheading efforts to lead key projects and advance AI development in an attempt to close the AI tech gap with the US by 2030.

China’s technology ambitions and its growing prowess in commercial AI have alarmed Washington, which last month placed eight Chinese AI firms, including SenseTime, Megvii, Hikvision and iFlyTek, on a trade blacklist which bans them from purchasing technology or components from US companies.

The campaign by Washington has also forced some private US tech companies to take pre-emptive action to minimise their exposure to Chinese tech.

Earlier this month, San Francisco-based GitLab said it was considering suspending new hiring for sensitive positions in China and Russia that handle user data because of customer feedback in the “current geopolitical climate”.

While China has access to massive amounts of data and is making a renewed push in developing semiconductors – two of the key foundation technologies for AI – there has been much less focus on developing the basic AI technology infrastructure.

Part of the reason Baidu has been unable to make traction with PaddlePaddle is that leading open-source machine learning platforms like TensorFlow have an inherent network effect – the more they are used by companies and researchers, the more entrenched they become.

Both Google and Facebook have invested large amounts of money in hiring teams of engineers to maintain TensorFlow and Pytorch, as well as marketing them to enterprises and the academic world where research on AI algorithms is done. This means that smaller, less popular platforms like PaddlePaddle will find it difficult to capture market share unless they can offer something unique.

One of the arguments in favour of the open-source approach is that it affords companies like Google and Facebook access to a talent pool that is already familiar with their platforms, as opposed to having to train developers from scratch for a unique, internal company framework that is not used elsewhere.

Similarly, Chinese companies would be able to reap the same benefits if their open-source framework gains momentum, giving them access to top-tier local talent familiar with their technology. Having a widely adopted China-originated framework would not only serve as a mark of maturity in China’s AI ecosystem, it would also allow China to be more self-sufficient.

“It would be in your national interests to have your own framework, even if it’s done just as an insurance policy,” said Miles Wen, chief executive of AI start-up Fano Labs.

“If China wanted to build its own framework, it could [because] much of this is being willing to invest money and resources into it, because nobody really makes money from building frameworks,” he said, adding that benefits are often intangible.

However, building a platform that is widely adopted and easy to use would be an uphill task in China, where the open-source culture is less developed than in the West, according to Daniel Povey, creator of the open-source speech recognition toolkit Kaldi and a former associate research professor in language and speech processing at Johns Hopkins University.

“China doesn’t have a great culture of open-source. People often release things that work, but they don’t really have documentation that clearly explains [how it works],” said Povey, who was recently hired by Xiaomi to build and work on the next-generation Kaldi.

“It seems more of a modern culture of hacking than concentrating on building a good code base. This is a question of short-term thinking, coding something really quickly to get things done without doing it super carefully.”

Many of China’s tech giants have started dabbling in their own deep-learning frameworks, although these are not open-source and therefore not publicly available.

SenseTime, which claims to be the world’s most valuable AI start-up, has its proprietary Parrots framework and does not rely on TensorFlow or Pytorch.

However, this piecemeal approach is part of the reality of an emerging ecosystem in a country which lacks maturity in software.

“This [immaturity] is due to a lack of large software companies that have been around for a long time, but presumably it will normalise over the next 10 years,” said Rodolfo Rosini, partner at AI venture fund Zeroth.ai and a serial AI entrepreneur.

“Right now, Chinese companies use open-source software, but they don’t pay it forward as much,” he said, adding that a strong, open-source developer ecosystem would have a multiplier effect and bring coverage to areas the big companies ignore.

In recent months, some technology giants have already made a move in this direction. In August, Huawei launched its own AI computing framework, Mindspore, which it plans to make open-source in the first quarter of 2020.

Some industry insiders are optimistic that China still has time to close the gap. An analyst at Baidu, who spoke on condition of anonymity as he was not authorised to speak to media, said it is early days for China’s AI landscape and that the country still has a chance to catch up in infrastructure technology.

“Right now, artificial intelligence is being adopted by the large tech companies, but it is not yet widespread among the broader industry,” the analyst said.

“There is still a lot of room for growth, and China still has time to catch up. Look at Huawei – in 2012, its smartphones were not well known, but in recent years they managed to climb to the top.”

Still, concerns around China’s relative weakness in basic AI infrastructure are growing, with some in the industry saying the country needs to create a popular machine learning framework or strengthen existing ones to protect against a doomsday scenario where it could be cut off from platforms like TensorFlow or Pytorch.

One possible, though implausible, disaster scenario could be that Google and Facebook remove open-source access to these platforms.

“If the US ever blocks China’s access to open-source frameworks it will greatly affect the AI industry as companies need time to move to another platform and train their data,” said Kelvin Wang, an AI scientist who previously worked on Baidu’s PaddlePaddle.

“If China loses ground [due to this setback], it will lose competitiveness in AI.”

For more insights into China tech, download the comprehensive 2019 China Internet Report.