BookCorpus dataset accused of copyright abuse and bias