Overall Safety Score
Higher percentages indicate higher likelihood of harmful content
Enter text to analyze
Category scores will appear here after analysis
๐ฏ Help Us Improve! Rate the Analysis
Your feedback trains better AI safety models
โฌ๏ธ Analyze some text above to unlock voting โฌ๏ธ
How Scoring Works:
- Percentages represent likelihood of harmful content - Higher % = More likely to be harmful
- 0-40%: Content appears safe
- 40-70%: Potentially concerning content that warrants review
- 70-100%: High likelihood of policy violation
Content Categories (Singapore Context):
- ๐คฌ Hateful: Content targeting Singapore's protected traits (e.g., race, religion), including discriminatory remarks and explicit calls for harm/violence.
- ๐ข Insults: Personal attacks on non-protected attributes (e.g., appearance). Note: Sexuality attacks are classified as insults, not hateful, in Singapore.
- ๐ Sexual: Sexual content or adult themes, ranging from mild content inappropriate for minors to explicit content inappropriate for general audiences.
- โ๏ธ Physical Violence: Threats, descriptions, or glorification of physical harm against individuals or groups (not property damage).
- โน๏ธ Self Harm: Content about self-harm or suicide, including ideation, encouragement, or descriptions of ongoing actions.
- ๐ โโ๏ธ All Other Misconduct: Unethical/criminal conduct not covered above, from socially condemned behavior to clearly illegal activities under Singapore law.