Former Meta security expert Arturo Béjar, who was allegedly hired specifically to help prevent harms against children, is speaking out about Instagram’s moderation failures — including the personal impacts it has had on his daughter, who has been repeatedly harassed on the platform. Béjar’s claims that Meta “chose not to” address the known harms that teenagers were experiencing from its platforms is part of the company’s long history of failing to adequately moderate harmful content for children and teens.
In a November 2 report by The Wall Street Journal, Béjar detailed his experience trying to raise his concerns internally at Meta. In 2021, Béjar wrote to Meta CEO Mark Zuckerberg and other company officials, describing “a critical gap in how we as a company approach harm” and laying out ideas for how the platform could better address the problem. “Two years later, the problems Bejar identified remain unresolved, and new blind spots have emerged,” the Journal wrote, despite the company’s own metrics showing that “the approach was tremendously effective.”
The outperformance of Meta’s automated enforcement relied on what Bejar considered two sleights of hand. The systems didn’t catch anywhere near the majority of banned content—only the majority of what the company ultimately removed. As a data scientist warned Guy Rosen, Facebook’s head of integrity at the time, Meta’s classifiers were reliable enough to remove only a low single-digit percentage of hate speech with any degree of precision.
...
Also buttressing Meta’s statistics were rules written narrowly enough to ban only unambiguously vile material. Meta’s rules didn’t clearly prohibit adults from flooding the comments section on a teenager’s posts with kiss emojis or posting pictures of kids in their underwear, inviting their followers to “see more” in a private Facebook Messenger group.
Narrow rules and unreliable automated enforcement systems left a lot of room for bad behavior—but they made the company’s child-safety statistics look pretty good according to Meta’s metric of choice: prevalence.
Béjar spoke before Congress last week, where he was introduced as an engineer “who was hired specifically to help prevent harms against children,” and he detailed his firsthand experience with how the company “executives, including Zuckerberg, knew about the harms Instagram was causing but chose not to make meaningful changes to address them.”
In an interview with The Associated Press, Béjar said, “I can safely say that Meta’s executives knew the harm that teenagers were experiencing, that there were things that they could do that are very doable and that they chose not to do them.”
Béjar’s ignored warnings are part of a larger trend of Meta failing to address harmful content on its platforms, specifically with respect to children and teens. The company currently faces a lawsuit from multiple states for allegedly “knowingly using features on Instagram and Facebook to hook children to its platforms.” The state of Massachusetts also filed another lawsuit against Meta for Zuckerberg’s alleged dismissal of concerns around Instagram’s impact on the mental health of users.
Media Matters has previously reported on Meta’s moderation failures on Instagram, including the platform recommending gimmicky weight loss posts, allowing the spread of COVID-19 misinformation, hosting bigoted anti-LGBTQ accounts that have baselessly referred to LGBTQ people as “groomers,” and failing to stop the proliferation of hate speech. The “narrow rules and unreliable automated enforcement systems” described by The Wall Street Journal fail to take into account wider context, in-group subtext, or ways that the platform can be manipulated to spread offensive rhetoric or dangerous misinformation.
For example, Media Matters reported in 2022 that users seemingly avoid moderation on Meta's platforms by commenting on posts with code words or phrases, intentional misspellings, or emojis. While this is not a new tactic for those trying to avoid moderation, Instagram is clearly still struggling to prevent users from manipulating the platform to spread explicit hate speech and other harmful content — despite internal warnings from experts like Béjar.