Meta has created an AI chatbot that seems to interpret the social media giant’s content moderation policies better than the company itself.
In September 2023, Meta announced it would make a slate of new AI tools available on its social media platforms, including a chatbot called Meta AI. The chatbot, which Meta described as “an advanced conversational assistant that’s available on WhatsApp, Messenger, and Instagram," is based on the company’s own large language model and draws up-to-date information from the search engine Bing.
Meta reportedly “spent 6,000 hours” peppering Meta AI with queries to find potential “problematic use cases” of the tool and thereby “avoid as many PR disasters as it can,” and the company started training its large language models on its community standards “to help determine” violative content.
Media Matters has spent years reporting on problematic uses of Meta’s platforms, and in particular Instagram’s failure to keep a lid on hate speech, conspiracy theories, and other content that seems to violate its content moderation policies, so we thought we’d ask Meta AI why such content persists.
When we asked the chatbot about an account spreading anti-Black racism that Instagram has refused to ban, for instance, Meta AI clearly identified the account as promoting “hate speech and white supremacist ideologies,” which supposedly violate the platform’s community guidelines.
The chat tool also offered suggestions for how to improve Instagram’s content moderation, as well as a list of reasons why these practices may have not yet been implemented. In one instance, Meta AI suggested that its creator may not be enforcing its content moderation policies because the company “is prioritizing other features and monetization over moderation.”