This tool contains an intelligent algorithm that automatically neutralizes sensitive text on the webspace using a plugin that is a combination of NLP and Web Development.
At this point on the internet, people from all over the world are interacting with each other. In order for marginalized communities to not undergo harassment we need to have a robust and general solution. This solution will only work if both the parties win.
We plan to have a generalized Machine Learning model which can detect and neutralize racist/sexist content. Our solution will have three sections.
Based on threat level of the text, we will be able to auto-report the site/original poster. Or we
will simply choose to
neutralize the text to suit the user's taste.
Eg:
"I will kill this faggot." counts as a threat.
"This faggot doesn't know anything." is flagged as insensitive
and not a
threat. The
text will be neutralized.
Text classification is a key function required for the detection of racist and sexist text. For
this, we have scrapped
data from over 20 sources to collate one consolidated dataset which covers over 30,000 rows of
sexist text, annotated
and classified.
XLnet is an extension of the Transformer-XL model pre-trained using an autoregressive method to
learn bidirectional
contexts by maximizing the expected likelihood over all permutations of the input sequence
factorization order. In
simple words - XLNet is a generalized autoregressive model. XLNet uses Transformer XL as a feature
extracting
architecture, which is better than BERT’s Transformer since Transformer XL added recurrence to the
Transformer. Which
can make the XLNet has a deeper understanding of the language context. XLnet has outperformed BERT,
T5, DistilBERT in
the text classification domain.
For the purpose of paraphrasing, we have two main frameworks in mind: