When Evidence of Safety Doesn’t Produce Safety

Emiliana Mbelenga built Project RISE because formal disclosure systems weren’t reaching Kenyan children experiencing sexual abuse. The institutional platforms existed, complete with reporting mechanisms, trained moderators, and established protocols, but they operated in English with assumptions about institutional trust that didn’t map onto the communities where harm was occurring. The children Mbelenga works with speak Sheng, a street language mixing English and Swahili, and they don’t trust government reporting systems. UNICEF and Interpol research from 2021 found that 61% of students surveyed didn’t know where to get help when they experienced online sexual exploitation (1). When children did disclose to caregivers, 31% had their devices confiscated. To the caregiver, confiscation was protective. To the child, it meant disclosure got punished.

So Mbelenga built infrastructure that works in the language children speak and through trust relationships they recognize: a platform in Sheng with a 24/7 AI chatbot for confidential support, a caregiver portal to educate parents on trauma recognition, and Safe Circles, teacher-led peer support groups in schools. The platform reaches children the institutional systems miss, not because those institutional systems are poorly designed for their stated purpose, but because their purpose is producing documentation that satisfies institutional accountability requirements in the language and through the trust relationships institutions recognize. What makes Project RISE effective isn’t necessarily technological sophistication, but proximity, operating in the language, norms, and trust relationships where harm is first recognized.

At the SHIELD Global Online Safety Conference in March 2026, this pattern was highlighted across harm types, regions, and platforms (2). Safety interventions consistently operate at the level where harm becomes visible, such as when content is posted, behavior is flagged, or reports are filed, while the conditions that produce that harm lie one layer deeper. They are in the design choices made before anyone uses the product, and in the communities those design choices never considered. This isn’t unique to Kenya or to disclosure systems. The same structural pattern appears wherever institutions respond to digital harm: intervention resources concentrate at the layer where harm has already become visible and documentable, while the conditions that produced that harm sit one layer below, in design decisions and community exclusions that rarely face the same scrutiny.

Where Responses Land Versus Where Harm Starts

The visible layer gets the intervention resources: content moderation removes harmful posts, age verification restricts platform access, parental controls monitor screen time, and regulation holds platforms accountable for what they publish. These are real interventions addressing real harm, but they address the visible output of harm rather than the conditions producing it.

Those conditions are one layer lower, in places that rarely get the same intervention resources. The first is the design layer, where decisions made before a product launches determine what kind of harm becomes structurally inevitable. A platform engineered primarily for engagement will produce harm as a design output, not as an exception to it. By the time content moderation arrives to clean up what’s already spread, the platform architecture has already shaped how users interact, what content gets amplified, and which behaviors get rewarded. For the people moving through that system, especially children, those architectural choices determine exposure long before any policy violation is logged.

YouTube’s recommendation algorithm optimizes for watch time, creating predictable patterns where users watching fitness content get recommended increasingly extreme diet content, users researching vaccines get pushed toward anti-vaccine material, and users watching news clips find themselves in progressively more sensationalized political echo chambers. Content moderation can remove individual harmful videos after they’ve been flagged, but the algorithm that recommended them to thousands of users before removal continues optimizing for the same engagement metrics that amplified the harm in the first place.

The second is the community layer, where tools designed for one context fail predictably when deployed in communities that weren’t part of the design conversation. A disclosure platform built in English with assumptions about formal institutional trust reaches children in English-speaking communities who trust formal institutions, but it doesn’t reach children who speak Sheng in Kenya and have learned through experience not to trust government reporting systems. Age verification designed around government-issued IDs works in contexts where teenagers have those IDs, but fails completely in communities where most teenagers don’t have formal identification documents. The tool works exactly as designed. In this case, the design excludes the communities it claimed to protect.

The Architecture Problem

Instagram’s approach to sextortion demonstrates how responses at the visible harm layer consistently miss the design decisions producing that harm. In 2024, sextortion was thriving on the platform, so Meta responded with a progressive rollout of safety features, each one triggered by documented pressure.

After parental advocacy groups documented that perpetrators were using stranger DMs to initiate contact, Meta limited DMs from strangers to teens and restricted who can message teens who don’t follow them back.
After reports showed teenagers weren’t enabling the blur feature because they didn’t understand the risk, Meta implemented nudity protection that automatically blurs images detected as nude without requiring user opt-in.
After law enforcement documented accounts systematically targeting multiple minors, Meta added account restrictions for adult accounts showing suspicious patterns of contacting minors.
After mental health organizations reported that sextortion victims didn’t know where to get help, Meta built reporting flows specifically for sextortion that connect users to support resources.

Each feature addressed a visible symptom of the harm. Yet, the design decisions enabling sextortion remained fundamentally unchanged. A perpetrator can still establish contact through comments, build trust over weeks or months, move to DMs once accepted as a known contact, then initiate coercion in a space designed to be private and unmonitored. The safety features added friction to image transfer without addressing the architecture that makes coercive relationships possible in the first place.

Meta can point to feature adoption metrics showing millions of users have protections enabled, but practitioners working with sextortion victims report that the features don’t address the manipulation tactics, the threats, or the power imbalance that makes teenagers vulnerable. The features respond to the symptom while the structural conditions that produce it remain intact. This pattern of responding to symptoms while leaving root causes intact is a result of the structure of accountability frameworks.

Why Institutional Responses Stop At The Visible Layer

Platform accountability frameworks are structured around what institutions can measure and regulate: published content, user reports, response times, enforcement rates. These are the legible outputs of harm, quantifiable enough for regulators to audit and platforms to demonstrate compliance with. Design decisions are harder to regulate because they operate at a level where harm is still theoretical rather than documented.

How do you measure whether a recommendation algorithm is structured to minimize harm versus maximize engagement when both goals use fundamentally similar underlying architecture?
How do you audit whether a platform’s core business model creates conditions that make certain harms structurally inevitable?
How do you regulate design choices made before a product launches, before harm becomes visible, before anyone can demonstrate that the architecture will produce predictable failures?

The institutional accountability layer operates where harm becomes measurable through user complaints and content violations. Design accountability would require intervening where harm is still architectural rather than actualized, where the evidence is system analysis rather than incident reports, where the intervention is “don’t build it that way” rather than “remove it after it causes documented harm.” The design layer isn’t the only place where harm originates below where institutions see it. Community exclusion creates another gap.

The Community Design Gap

Safety tools built without the communities they claim to protect fail in ways that the designers didn’t foresee because the communities experiencing the failure weren’t in the room when the tool was built. A parental control app designed in Silicon Valley often assumes that families have regular conversations about technology use, that parents have time to review detailed activity reports, and that the trust relationship between parent and child makes monitoring feel protective rather than invasive. Those assumptions map onto some families while completely missing families where parents work multiple jobs with limited time for dashboard review, where language barriers exist between immigrant parents and English-speaking children, where discussing technology use isn’t culturally normative, and where device monitoring would be interpreted as profound distrust rather than protective care.

The tool works as designed. The design just excluded the communities where those assumptions don’t hold, and that exclusion isn’t visible to the people who built it because those communities weren’t part of the design process that would have revealed the gap. This gap between design assumptions and community reality is precisely why practitioners working inside those excluded communities see harm that institutional systems miss.

Where Safety Is Actually Produced

Practitioners operating inside affected communities are not observing the system from the outside. They are doing the safety work that institutional frameworks currently cannot. They’re operating inside the trust relationships where harm shows up before it becomes documentable.
Athena Morgan built Mindful Clicks Africa in Kenya after observing that children who actively create content online show significantly stronger resistance to harmful content than children who only consume. That finding should shape safety tool design. Instead, institutional safety approaches built in Western contexts emphasize restriction and monitoring over creative engagement, operating from fear-based frameworks that suppress the exact skills Morgan’s research shows build resilience.

Her locally rooted resilience models consistently outperform imported safety programs, not because they’re more sophisticated but because they’re built inside the cultural contexts where they operate. Western safety tools assume fear-based restriction works universally. Kenyan families that prioritize creative engagement over protective monitoring find those tools culturally inappropriate and fundamentally mismatched to how they actually raise children. The tools work as designed. The design excluded the communities it claimed to protect.

Morgan sees harm arising at two layers that institutional responses don’t reach: in the design assumptions that treat restriction as the universal best practice, and in the community exclusions that result when those assumptions are built into products without the excluded communities in the room. Morgan’s observation about where harm originates hints at the resource allocation problem: institutions invest where harm is already visible and can be counted, not where it starts.

The Intervention Gap

As previously stated, resources concentrate where harm is visible and can be counted: content moderation systems, platform reporting tools, age verification mechanisms, or parental control dashboards. These interventions operate at the layer where harm has already occurred and can be documented. What gets substantially less investment is design accountability mechanisms that would intervene before products launch, community-inclusive design processes that would reveal which populations the current design systematically excludes, and architecture reviews that would identify when core engagement mechanics create conditions where certain harms become structurally inevitable.

The SHIELD conference documented this resource allocation pattern across contexts. Platforms invest millions in content moderation systems while making minimal changes to the recommendation algorithms and engagement optimization mechanics producing the content being moderated. Regulators focus enforcement on holding platforms accountable for published content while the design decisions that shape what gets published and how it gets amplified remain largely outside regulatory scope. Safety tool developers build monitoring and restriction systems, while the communities that those tools will predictably fail to protect aren’t included in the design conversations that would reveal the gap.

This resource concentration at the visible layer isn’t inevitable. It is a result of how regulatory frameworks define compliance.

What Changes When Intervention Reaches The Design Layer

Regulatory frameworks that required design accountability rather than just content accountability would fundamentally shift how platforms approach safety.

If platforms couldn’t satisfy compliance requirements by demonstrating content moderation activity but had to prove their core architecture doesn’t systematically produce harm, the design process would change before products launch.
If age verification systems had to demonstrate they actually work for the communities they claim to protect rather than just documenting verification attempts, the design would need to include the communities currently excluded by assumptions about government-issued IDs.
If parental control tools had to show effectiveness across the cultural contexts where they’ll be deployed rather than just listing features, the development process would require including the communities currently designed out.

That shift would require regulators and funders who understand that harm originating in design decisions can’t be adequately addressed by content moderation responses, that tools built without marginalized communities will systematically fail those communities regardless of feature sophistication, and that consistently intervening one layer above where harm originates produces documentation of activity without producing safety. That regulatory shift would only be possible with a deeper understanding of why institutions currently measure at the visible layer rather than the origination layer.

What This Means For Safety Infrastructure

Safety responses that concentrate on where harm is visible will continue to produce activity records while harm persists at the origination layer. Content moderation will keep removing harmful posts while the algorithms amplifying them continue optimizing for engagement. Age verification will continue to document verification attempts, while communities without government-issued IDs remain excluded by architectural assumptions. Parental controls will keep monitoring screen time while families from non-Western contexts find the tools culturally inappropriate and fundamentally unusable. The interventions work as designed. The design just operates one layer above where the harm they’re designed to address actually starts.

The practitioners at the SHIELD conference are working at the origination layer in design processes that include excluded communities and in communities where institutional tools systematically fail. They demonstrate what becomes possible when intervention happens where harm actually originates. That work operates at a fundamentally different layer than institutional safety responses, requiring different expertise, different trust relationships, and different accountability mechanisms. It produces different outcomes because it addresses different conditions, building from the ground up with the communities institutional approaches excluded by design.

The question isn’t whether we know harm originates in design decisions and community exclusions. The practitioners documenting this pattern have made that clear. The question is whether institutional resources and regulatory frameworks will follow that knowledge or continue concentrating where harm is easiest to measure rather than where it actually starts.

Consistently intervening above where harm originates produces documentation of activity while leaving the conditions producing harm structurally intact. Harm that originates in design decisions requires design accountability. Harm that originates in community exclusion requires community-inclusive design processes. Responding to visible harm one layer above where it starts will continue generating the metrics that satisfy institutional requirements while practitioners working at the origination layer document that the harm those metrics claim to address continues unabated.

That’s a choice about which layer matters enough to fund, regulate, and hold accountable. Right now, institutions choose the layer where harm is legible to them over the layer where harm actually begins. That choice has a cost, and the people excluded from institutional design are the ones paying it. Real safety is produced only when resources and authority are invested in people embedded in the communities and trust relationships where harm first emerges, not in systems that intervene after harm becomes visible and countable.

I work closely with community-level online safety practitioners and am working on building accountability infrastructure for tech. Based in Lecce, Italy.

Sources from to article:

https://safeonline.global/wp-content/uploads/2023/12/DH-Kenya-Report_Revised30Nov2022.pdf
SHIELD Conference Reference document here: From the Field – SHIELD

Selected References and Further Reading

The examples and structural patterns discussed in this post draw on documented research, public disclosures, and institutional accountability frameworks. The following resources provide additional context on safety infrastructure, design‑layer accountability, and the limits of outcome measurement.

Online Safety, Disclosure, and Child Protection

UNICEF & Interpol – Online Sexual Exploitation and Abuse of Children
Research on disclosure, help‑seeking behavior, and barriers faced by children experiencing online sexual harm.
https://www.unicef.org/documents/online-sexual-exploitation-and-abuse-children
National Center for Missing & Exploited Children (NCMEC) – Financial Sextortion Reporting and analysis on online sexual exploitation and sextortion involving minors. https://www.missingkids.org/theissues/sextortion

Platform Safety Responses and Transparency (Meta)

Meta Transparency Center
Public reporting on content moderation, enforcement metrics, and safety feature adoption across Meta platforms.
https://transparency.meta.com
Meta’s own descriptions of safety features, policy updates, and platform responses to online harm. https://about.instagram.com/blog/announcements
Meta Oversight Board Charter & Scope: Defines the jurisdiction of Meta’s Oversight Board, including its focus on individual content decisions rather than system‑level design or algorithmic amplification. https://www.oversightboard.com/governance/charter

Algorithms, Engagement, and Structural Harm (Google / YouTube)

Mozilla Foundation, YouTube, Algorithms, and Harm:
Independent research documenting how engagement‑optimized recommendation systems create predictable patterns of amplification and harm.
https://foundation.mozilla.org/youtube
Google AI Principles: Google’s published framework outlining its approach to responsible AI development and governance. https://ai.google/principles

AI Safety Frameworks and Institutional Measurement (OpenAI)

OpenAI, Safety & Preparedness
Documentation on AI safety research, system evaluations, and governance frameworks for advanced models and autonomous systems.
https://openai.com/safety

Practitioner and Design‑Layer Perspectives

SHIELD Global Online Safety Conference, Reference Document (March 2026)
Practitioner‑led synthesis highlighting recurring gaps between institutional safety responses and harm origination at the design and community layers.
(see link above)

These materials reflect the distinction discussed throughout this post between safety measures that generate institutional accountability artifacts (policies, reports, features, metrics), and interventions that address where harm actually originates, in platform architecture and community exclusion.

They are not exhaustive, but are representative of the systems and accountability structures that shape how digital safety is currently designed, measured, and regulated.

When Evidence of Safety Doesn’t Produce Safety

Where Responses Land Versus Where Harm Starts

The Architecture Problem

Why Institutional Responses Stop At The Visible Layer

The Community Design Gap

Where Safety Is Actually Produced

The Intervention Gap

What Changes When Intervention Reaches The Design Layer

What This Means For Safety Infrastructure

Related

Corvaglia.me

When Evidence of Safety Doesn’t Produce Safety

Where Responses Land Versus Where Harm Starts

The Architecture Problem

Why Institutional Responses Stop At The Visible Layer

The Community Design Gap

Where Safety Is Actually Produced

The Intervention Gap

What Changes When Intervention Reaches The Design Layer

What This Means For Safety Infrastructure

Share this:

Related

Corvaglia.me