Github Copilot 'code laundering'

July 2021
Updated: December 2021

Reports by The Verge, Wired and other publications detail copyright concerns about GitHub's new Copilot automated code generation tool.

Github developed Copilot in conjunction with OpenAI by training the system on publically available source code of different projects publicly available in Github repositories, leading to accusations of copyright abuse and 'code laundering'.

Lawyers are divided on whether Github and OpenAI's use of public repositories can be considered 'fair use' given Copilot draws extensively on code available under many different types of public license, some of which require attribution for derivative works, and given Github's commercial business model and OpenAI's shift away from being a non-profit entity.

Copilot is built on OpenAI's new Codex algorithm, which is descended from its GPT-3 language generating algorithm. GPT-3 has been found to suffer from inappropriate, offensive, racist and personal training data, resulting in several high-profile public backfires.

Operator: Microsoft/Github
Developer:
Microsoft/Github; OpenAI
Country:
USA
Sector: Technology
Purpose: Generate code
Technology: NLP/text analysis
Issue:
Copyright; Ethics; Privacy
Opacity: Attribution