Prosecraft fiction analytics database
Report incident π₯ | Improve page π | Access database π’
Prosecraft is a so-called 'linguistic database of literary prose' built on the texts of over 25,000 novels by thousands of different authors.
Benji Smith, who is behind Prosecraft and its sister company Shaxpir, used the content of books by Stephen King, Nora Roberts, Neil Gaiman, Angie Thomas, Terry Pratchett, John le CarrΓ©, and others, to build a database of novels that could be used to analyse their world count, story arc, and 'vividness', amongst other criteria.
Dataset info π’
Operator: Benji Smith/Shaxpir
Developer: Benji Smith/Shaxpir
Country: USA; Australia
Sector: Media/entertainment/sports/arts
Purpose: Analyse literature
Technology: Database/dataset
Issue: Copyright; Ethics/values
Transparency: Governance; Marketing
Risks and harms π
The Prosecraft fiction analytics database raised copyright concerns and potentially compromised authors' creative rights by analysing and categorising published works without explicit permission, potentially leading to the unintended exposure of writing styles and techniques.Β
Transparency and accountability π
The Prosecraft fiction analytics database is seen to suffer from several significant transparency limitations.
Lack of consent. Prosecraft used the full text of over 25,000 copyrighted books without obtaining permission from the authors or publishers, raising significant ethical and legal concerns.
Data collection. The methods used to acquire the texts, described as "crawling the internet," are not fully transparent, leaving questions about the legality and ethics of the data collection process.
Data sources. The developer, Benji Smith, did not disclose the sources of the works or whether he had received permission to use them, which undermines the transparency of the dataset's origins and usage.
Algorithmic workings. The dataset was used to train AI algorithms to analyse literary styles, but there was no clear explanation of how these algorithms were developed or the specific data processing methods employed.
Usage guidelines. There were no clear guidelines or restrictions on how the data could be used, potentially leading to misuse or unethical applications.
Monetisation. Though Prosecraft itself was not monetised, parts of the data were used to develop Shaxpir, a paid tool, raising questions about the commercial exploitation of the data without proper authorisation.