Github Copilot 'code laundering'
July 2021
Updated: December 2021
Reports by The Verge, Wired and other publications detail copyright concerns about GitHub's new Copilot automated code generation tool.
Github developed Copilot in conjunction with OpenAI by training the system on publically available source code of different projects publicly available in Github repositories, leading to accusations of copyright abuse and 'code laundering'.
Lawyers are divided on whether Github and OpenAI's use of public repositories can be considered 'fair use' given Copilot draws extensively on code available under many different types of public license, some of which require attribution for derivative works, and given Github's commercial business model and OpenAI's shift away from being a non-profit entity.
Copilot is built on OpenAI's new Codex algorithm, which is descended from its GPT-3 language generating algorithm. GPT-3 has been found to suffer from inappropriate, offensive, racist and personal training data, resulting in several high-profile public backfires.
Operator: Microsoft/Github
Developer: Microsoft/Github; OpenAI
Country: USA
Sector: Technology
Purpose: Generate code
Technology: NLP/text analysis
Issue: Copyright; Ethics; Privacy
Opacity: Attribution
Reference
News, commentary, analysis
https://www.wired.com/story/github-commercial-ai-tool-built-open-source-code/
https://www.theverge.com/2021/6/29/22555777/github-openai-ai-tool-autocomplete-code
https://www.theverge.com/2021/7/7/22561180/github-copilot-legal-copyright-fair-use-public-code
https://www.fastcompany.com/90653878/github-copilot-microsoft-openai-coding-tool-backlash
https://thenewstack.io/github-copilot-a-powerful-controversial-autocomplete-for-developers/
https://thenextweb.com/news/github-copilot-ai-copyright-analysis
https://analyticsindiamag.com/why-are-people-criticising-github-copilot/