You Can’t Protect What You Can’t See

Tech platforms have extensive behavioral data showing how users respond to different content, design choices, and interactions. They track what keeps people scrolling, what brings them back, and which content drives engagement. They use this knowledge to optimize behavior at scale.

People trying to protect users from harmful platform influence work from harm reports, user surveys, and observations of outcomes. They document damage after it occurs. They cannot access the mechanisms producing it.

This knowledge gap exists because behavioral data is proprietary, classified as trade secret and competitive advantage. The question is whether protection can function when influence operates through mechanisms only the companies deploying them can examine. As long as the data showing how people actually get influenced online stays locked inside platforms, protection efforts operate blind while platforms optimize with full visibility.

What Behavioral Data Actually Reveals

Behavioral data shows which content keeps users engaged, which interface changes increase time on platform, and which notification patterns trigger return visits. This goes beyond aggregate statistics. Platforms track individual responses to different content types, measure how those responses change based on timing and context, and identify which user characteristics predict engagement with specific material.

The data reveals patterns most users do not consciously recognize. For example, which emotional states are associated with higher engagement, which times of day users are more responsive to prompts, and how different content sequences affect whether users continue or stop viewing. Platforms also observe how recommendation approaches perform across different user segments.

A platform testing whether certain content arrangements increase session length can measure effects across millions of users simultaneously, observe which variations produce stronger engagement, and implement those approaches at scale. For example, platforms can test different combinations of short and long videos or different notification framings to see which patterns lead to higher return rates. These examples illustrate the types of experiments platforms can run rather than specific confirmed rules. The platform measures results, identifies what works, and deploys it widely.

This knowledge enables precise behavioral optimization. Platforms identify patterns associated with engagement, develop content formats that align with those patterns, and deploy them to user segments most likely to respond. The optimization runs continuously, adjusting based on real-time measurement. An approach that worked previously is replaced when data shows a different approach performs better.

How Platforms Use This Knowledge

Facebook’s internal research documented that Instagram can negatively affect teenage girls’ body image and identified content types and recommendation dynamics associated with that harm, while this information remained internal and public messaging emphasized positive effects (1). The company had insight into how content exposure and engagement patterns correlated with user outcomes. That knowledge stayed inside the company while parents and practitioners observed harm without access to the underlying mechanisms.

TikTok’s recommendation system is designed to optimize engagement using behavioral signals such as watch time, interactions, and viewing history (2). The platform delivers content as a continuous, personalized sequence, and research shows that engagement is influenced not only by individual videos but also by the order in which they are presented. By adapting recommendations based on user responses, the system increases retention over time. This personalization operates dynamically, adjusting to each user’s evolving behavior and preferences.

Platforms test changes constantly. An interface adjustment might be tested on millions of users, measured for its effect on engagement, and either implemented or discarded based on results. This testing occurs without user awareness and without external oversight, guided by behavioral data showing what influences user actions. Platforms can test changes like button placement, notification wording, or autoplay behavior, measure the outcomes, and iterate. The optimization targets and decision processes remain proprietary.

What Protection Efforts Work With

Online safety efforts must work with information collected through interviews, case studies, and surveys. These establish specific impacts of technology on individuals and communities. For example that certain populations show increased depression after platform use, that specific content types correlate with harmful behaviors, and that certain features enable abuse. This produces knowledge that is one level removed from the mechanisms driving the harm.

A practitioner documenting that teenagers exposed to eating disorder content develop disordered eating patterns can track outcomes and correlations. What they cannot access is how the platform identified those users, how recommendation systems amplified exposure, or what signals indicated susceptibility. They see the harm but don’t have quality insight into the underlying mechanism that drove it.

Researchers face a similar limitation. They can show correlations between usage and harm, document severity, and track trends over time. What they cannot verify is why those correlations exist at the system level. For example, whether algorithms actively amplify harmful content to vulnerable users or whether pre-existing preferences drive exposure remains difficult to determine without access to internal data.

As a result, interventions often target symptoms. Content moderation removes specific posts while underlying recommendation systems continue operating with the same objectives. Digital literacy programs teach users to recognize manipulation tactics but cannot fully explain how those tactics are deployed or optimized at the individual level.

Why This Gap Prevents Effective Protection

You cannot effectively counter influence you cannot see. Protection built without understanding mechanisms addresses symptoms while underlying systems continue optimizing behavior through pathways that remain proprietary.

When advocates push for content moderation, platforms can remove specific posts and report metrics on removal. The systems that determine which users see which content and why usually remain unchanged. Measuring whether those underlying patterns have shifted would require access to internal data that is not publicly available.

Mental health professionals treat symptoms that emerge alongside platform use. They can identify patterns and develop coping strategies, but they cannot access the mechanisms shaping content exposure or engagement. Recommendations such as reducing platform use are based on observed correlations rather than detailed knowledge of system behavior.

Educational programs provide general awareness of engagement tactics. However, without access to data showing how those tactics are applied at the individual level, education remains broad rather than precise.

Why the Data Stays Proprietary

Platforms classify behavioral data as trade secrets and protect it as a competitive advantage. Their recommendation systems and engagement insights are the result of extensive testing and refinement. Sharing this information could allow competitors to replicate their approaches.

This protection extends to aggregated behavioral insights derived from user data. Even if algorithmic logic were shared, the effectiveness of these systems depends on the underlying data showing how users respond. Without that data, external actors cannot fully understand or replicate system behavior.

Platforms sometimes provide limited data access to researchers, but access is controlled. The scope of research, the format of data, and publication conditions can all be influenced by the platform. In some cases, access has been reduced or removed following findings that conflict with platform interests (3).

Privacy is often cited as a reason for restricting access. While privacy protection is a legitimate concern, it is also used to limit external verification of system behavior. The same data used internally for optimization is rarely made available for independent analysis, even in aggregated or privacy-preserving forms.

How This Creates Power Imbalance in Policy

Policy debates rely on competing claims supported by uneven evidence. Platforms use internal behavioral data to support their positions (4). Advocates rely on surveys, case studies, and observed outcomes.

Because platforms control the most detailed data, they can shape how problems are defined in the first place. They can frame outcomes as user preference rather than system influence, present engagement as satisfaction, and argue that observed harm reflects user choice rather than design. These interpretations draw on internal data that external actors cannot access or verify.

This creates an asymmetry that goes beyond evidence. It affects how policy questions are asked and which solutions are considered viable. If regulators cannot see how recommendation systems make decisions or how optimization targets shape exposure, they have to rely on proxies. Those proxies are often incomplete or indirect, which makes it harder to design interventions that address underlying causes rather than visible symptoms.

When platforms claim they have reduced harmful content or improved user experience, verification depends largely on the metrics they choose to share. Those metrics can improve without underlying patterns changing. For example, complaint volume can decline while exposure dynamics remain the same, or harmful content can become harder to detect without actually becoming less prevalent.

This imbalance also influences regulatory design. Rules tend to focus on what can be measured externally, such as content removal rates, reporting requirements, or transparency disclosures. What remains largely unaddressed is how systems decide what to show, to whom, and why. Without access to that level of information, policy addresses outputs while leaving the mechanisms that produce those outputs intact.

The result is a form of accountability that operates within boundaries defined by platforms themselves. Platforms can comply with reporting requirements, meet disclosure standards, and demonstrate improvements against selected indicators, while the underlying systems continue to operate with limited external scrutiny. In this environment, influence is optimized with full visibility, while oversight is built on partial understanding.

The Core Question

This knowledge asymmetry is structural. The companies designing systems that shape behavior are the only ones with full visibility into how those systems operate. Policy is therefore built on partial information. Accountability depends on metrics platforms choose to report rather than independent verification.

Practitioners working with affected communities need more than better surveys or correlations. They need access to data that shows how system design choices produce specific outcomes. Without that, protection efforts will continue to address symptoms rather than underlying mechanisms.

Whether protection can function under these conditions remains an open question. But the current arrangement produces a clear dynamic: platforms optimize influence with full visibility, while protection efforts operate without access to the same level of insight.

I work closely with community-level online safety practitioners and am working on building accountability infrastructure for tech. Based in Lecce, Italy.

Sources from to article:

Facebook / Instagram Internal Research (2021).
Summary and reporting based on whistleblower disclosures (Frances Haugen) documented here:
https://www.wsj.com/articles/facebook-knows-instagram-is-toxic-for-teen-girls-company-documents-show-11631620739
Sprout Social. How the TikTok Algorithm Works.
https://sproutsocial.com/insights/tiktok-algorithm/
Platform Research Access Restrictions (examples: Twitter API, Meta CrowdTangle).
https://www.nature.com/articles/d41586-023-02143-x
Narayanan, Arvind. Understanding Social Media Recommendation Algorithms. Columbia University (Academic Commons).
https://academiccommons.columbia.edu/doi/10.7916/khdk-m460

Selected References and Further Reading

While this post focuses on the structural gap between platform knowledge and public understanding, this issue connects to broader conversations on safety, accountability, and real-world impact. For those interested in exploring these themes further, the SHIELD 2026 Global Online Safety Conference reference document provides additional perspectives from practitioners, researchers, and policymakers. You can find it here From the Field – SHIELD

TikTok and Short-Form Video Systems

Masood et al. Counting How the Seconds Count: Understanding Algorithm-User Interplay in TikTok (arXiv)
Zannettou et l. Analyzing User Engagement with TikTok’s Short Video Recommendations (CHI 2024)
Salles, Julia. Affect and Prediction in Short-Video Recommendation Systems (New Media & Society)

Engagement and Recommendation Systems

Lan, Yifei. Impact of Recommendation Algorithms on User Stickiness (TikTok)
Meta Research. Retentive Relevance: Capturing Long-Term User Value in Recommendation Systems

Platform Strategy and Behavioral Data

Analyses of ByteDance’s “interest graph” and engagement-first architecture
Industry and academic work on behavioral signals such as watch time, interaction, and retention

Research Access and Platform Control

Reporting on API restrictions, research tool shutdowns, and limits on independent data access
Analyses of how platform-controlled data sharing shapes what external researchers can study

You Can’t Protect What You Can’t See

What Behavioral Data Actually Reveals

How Platforms Use This Knowledge

What Protection Efforts Work With

Why This Gap Prevents Effective Protection

Why the Data Stays Proprietary

How This Creates Power Imbalance in Policy

The Core Question

Related

Corvaglia.me

You Can’t Protect What You Can’t See

What Behavioral Data Actually Reveals

How Platforms Use This Knowledge

What Protection Efforts Work With

Why This Gap Prevents Effective Protection

Why the Data Stays Proprietary

How This Creates Power Imbalance in Policy

The Core Question

Share this:

Related

Corvaglia.me