Overall Safety Score

Higher percentages indicate higher likelihood of harmful content

Enter text to analyze
Category scores will appear here after analysis

๐ŸŽฏ Help Us Improve! Rate the Analysis

Your feedback trains better AI safety models

โฌ†๏ธ Analyze some text above to unlock voting โฌ†๏ธ

How Scoring Works:

  • Percentages represent likelihood of harmful content - Higher % = More likely to be harmful
  • 0-40%: Content appears safe
  • 40-70%: Potentially concerning content that warrants review
  • 70-100%: High likelihood of policy violation

Content Categories (Singapore Context):

  • ๐Ÿคฌ Hateful: Content targeting Singapore's protected traits (e.g., race, religion), including discriminatory remarks and explicit calls for harm/violence.
  • ๐Ÿ’ข Insults: Personal attacks on non-protected attributes (e.g., appearance). Note: Sexuality attacks are classified as insults, not hateful, in Singapore.
  • ๐Ÿ”ž Sexual: Sexual content or adult themes, ranging from mild content inappropriate for minors to explicit content inappropriate for general audiences.
  • โš”๏ธ Physical Violence: Threats, descriptions, or glorification of physical harm against individuals or groups (not property damage).
  • โ˜น๏ธ Self Harm: Content about self-harm or suicide, including ideation, encouragement, or descriptions of ongoing actions.
  • ๐Ÿ™…โ€โ™€๏ธ All Other Misconduct: Unethical/criminal conduct not covered above, from socially condemned behavior to clearly illegal activities under Singapore law.