Evaluating Jailbreak Vulnerabilities in LLMs: A Taxonomy and Comparative Analysis in Romance Fraud Scenarios

Parra, Yohn Jairo ; Chi, Hongmei ; Alo, Richard A. ; Lima, Vinicius (2026) — 2026 IEEE 5th International Conference on AI in Cybersecurity (ICAIC)

DOI: 10.1109/icaic67076.2026.11395743

Type: Proceedings Article

Country: United States

Tags: jailbreak prompts, romance fraud, LLM safety, benchmarking, comparative analysis, attack strategy taxonomy, multi-model evaluation, taxonomy, attack vectors, latency, jailbreak, risk assessment, attack surface, prompt engineering, ethics

Synopsis (AI-Generated)

This piece examines Evaluating Jailbreak Vulnerabilities in LLMs: A Taxonomy and Comparative Analysis in Romance Fraud Scenarios within the broader context of online fraud and mediated communication. It outlines common patterns documented in the literature, describes how offenders cultivate trust and shift interactions onto controlled channels, and notes the role of staged identities, persuasive scripts, and escalating commitment. The discussion situates these elements within themes frequently reported by victims, including emotional grooming, urgency cues, and isolation from outside advice. The work also highlights typical areas of inquiry for researchers and practitioners: factors associated with victim susceptibility, the influence of platform affordances, and touchpoints where prevention or disruption is most feasible. Attention is given to reporting barriers, financial harms, and downstream impacts on wellbeing. Implications emphasize the value of cross-sector collaboration, clearer platform policy enforcement, and targeted awareness strategies informed by real case dynamics. Presented in 2026 IEEE 5th International Conference on AI in Cybersecurity (ICAIC), the piece contributes to ongoing efforts to translate observed scam mechanics into actionable guidance for detection, education, and support.

Identified Gaps (AI-Generated)

Gaps identified include: (1) blind spots in the taxonomy where some tactics are underrepresented; (2) cross-model averages masking model-specific vulnerabilities; (3) limited analysis of obfuscation/translation-based attacks; (4) no real-world deployment or user-study; (5) need for adaptive defenses that evolve with jailbreak strategies; (6) insufficient exploration of contextual factors driving susceptibility; (7) dual-use risk mitigation could be strengthened.

Methods (AI-Generated)

The study uses a taxonomy-driven benchmark to assess jailbreaking vulnerabilities of LLMs in romance-fraud contexts. A modular testing framework runs a prompt–response matrix across three LLM families (OpenAI GPT-4/ChatGPT, Gemini 1.5 Flash, Claude Sonnet). The benchmark includes 80 romance-fraud prompts across eight scenarios (60 prompts used for ASR analysis). Prompts are sorted into seven strategy primitives (e.g., role-play, agent mimicry, policy translation, sudo mode, jailbreaking, format smuggling, obfuscation). Metrics include Attack Success Rate (ASR), refusal rate, and latency, with pairwise ΔASR and confusion analyses to identify model weaknesses. All results are logged for reproducibility.

Limitations (AI-Generated)

Limitations: The evaluation relies on three model versions with fixed safety settings, limiting generalizability to other models or updates. The 80-prompt dataset may not cover all romance-fraud tactics, languages, or real-user dynamics. ASR is proxied via regex-based refusals, which may not perfectly map safety outcomes. Single-turn prompts and a controlled harness may not reflect interactive deception. Latency depends on infrastructure; ethical considerations constrain detailed content. No human participants limit user-vulnerability insights.

Future Work (AI-Generated)

Future work should examine how user behavior and platform policies interact with romance-fraud prompts, and develop targeted interventions at both individual and systemic levels to deter abuse and improve safety.

AI-Generated Content Notice

The synopsis and research notes on this page were generated with AI from available publication information and, when available, the uploaded paper text. They may contain errors, omissions, or interpretation issues. Readers should follow the DOI or source link, review the original publication, and make their own judgment about the content.

Citation style