The rapid development of generative artificial intelligence has raised concerns about the necessity of legal governance, and China has recently become one of the first jurisdictions to do so. Together with six other Chinese regulators, the Cyber Administration of China issued the Interim Measures for the Management of Generative Artificial Intelligence Services, which came into force on August 15.
Ian Liu, a partner at Deacons in Hong Kong, said AI service providers shall comply with the laws in pre-training and optimization training under the Interim Measures. The scope of compliance includes: the use of data and foundation models shall have lawful sources; data used shall not infringe against a third party’s IP rights; and measures should be employed to enhance the quality of training data.
He also pointed out that service providers shall bear the responsibility as the producers of online information content. “Where providers discover illegal contents, they shall promptly take disposal measures such as stopping generation, stopping the transmission or eliminating the illegal contents, make rectification through measures, such as model-based optimization training, and report to the relevant competent department,” he explained.
As for whether the service providers or the users should be held accountable if the AI system generates infringing content, Liu said: “We view that providers will have responsibilities to remove and employ measures to avoid illegal or infringing content from being generated. On the other hand, according to Article 4 of the Interim Measures, users of generative AI services shall also comply with the laws and respect the IP rights of others. Regarding infringement, each case shall be examined based on its facts. Depending on the fault of the providers and users, liabilities may be apportioned between the providers and users.”
However, restricting access to certain types of data or limiting the amount of usable data for training may hinder the technological development of AI systems in China. Liu explained this is because AI models require a large amount of diverse and high-quality data to learn and improve their performance. Limiting the availability of data could lead to slower progress in developing AI technologies.
“As there have been a number of successes in the training of large language models (LLMs) and base models, the AI developers have obtained benchmarks on the amount and type of data required for training a workable LLM and base models,” he said. “Based on the successful experience, AI developers will have more leeway to experiment with the variety of data required in training an LLM and focus on the quality of training data.”
According to Liu, the Interim Measures have set a standard for the AI industry in the selection of training data. In the short term, additional efforts may be necessary to carefully select and screen data, and ensure the training dataset’s accuracy, diversity, representation and overall quality. However, these endeavours will prove worthwhile in the long run, as a solid and reliable training dataset forms the cornerstone for effectively training a high-quality LLM and base model.
“Given that the legal landscape regarding training data is still evolving and there have been instances of copyright infringement lawsuits against AI developers related to training datasets, it is prudent to exercise caution when selecting and utilizing training data. Taking careful measures in the selection and use of training data can help mitigate the risk of infringement and potential liabilities associated with providing AI services,” he said.
Article 9 of the Interim Measures requires service providers to sign a service agreement with the users registered for their service. As copyright disputes may occur between providers and users, the service agreement should include terms regarding copyright ownership of AI-generated content, if such content can be considered as “works” under China’s copyright law or applicable copyright laws, as explained by Liu.
“There may be potential copyright issues in relation to the training data set, base model, generated outputs, user-supplied contents and user inputs. Moreover, in China, it is uncertain whether AI-generated content can be owned and protected by the copyright law,” said Liu, adding that even if AI-generated content may not be considered copyright-protectable, service providers should also expressly assert their rights over such content.
As service providers will receive IP content from users, Liu advised that the service agreement should include terms for licensing of user-supplied content and user inputs to the providers for generating AI content and AI training. They may also include terms to allow users to opt out from using their supplied data for further training.
“Furthermore, it is imperative to incorporate warranty, disclaimer, indemnity and limitation of liability terms. These terms will serve to elucidate whether providers offer any warranties regarding the services rendered, establish disclaimers pertaining to the base model content, clarify the providers’ or users’ obligations to indemnify the other party in case of third-party claims or losses arising from the use of the generated content or services, and delineate any limitations of liability for providers or users, if applicable.”
Finally, Liu advised that service providers should include terms specifying that users are responsible for their input to the service, their utilization of the service and any Al-generated content generated as a result. He said providers generally would also impose conditions for the use of the base model and generated content or services, such as terms to restrict the use of the AI services in any illegal or harmful ways or purposes and restrictions of the base model.