{"id":1,"date":"2025-04-10T18:40:30","date_gmt":"2025-04-10T18:40:30","guid":{"rendered":"https:\/\/blogs.tees.ac.uk\/chiopara\/?p=1"},"modified":"2025-04-14T11:40:41","modified_gmt":"2025-04-14T10:40:41","slug":"eval-spam-filters","status":"publish","type":"post","link":"https:\/\/blogs.tees.ac.uk\/chiopara\/2025\/04\/10\/eval-spam-filters\/","title":{"rendered":"Evaluating spam filters and Stylometric Detection of AI-generated phishing emails"},"content":{"rendered":"<h3><strong>The Rise of AI-Generated Phishing<\/strong><\/h3>\n<p>Large Language Models (LLMs) like GPT-4 are transforming how we communicate, but not always for the better. While these tools can streamline everything from writing emails to coding, they\u2019re also being misused by cybercriminals to craft phishing emails that are alarmingly convincing. These AI-generated messages mirror natural human language so closely that many traditional spam filters, typically tuned to catch suspicious links or known domains, are no longer enough.<\/p>\n<p>Our recent study, published in the Expert Systems with Applications titled <span style=\"font-size: 18pt\"><a href=\"https:\/\/doi.org\/10.1016\/j.eswa.2025.127044\"><strong><span style=\"color: #ff6600\"><em>Evaluating spam filters and Stylometric Detection of AI-generated phishing emails<\/em><\/span><\/strong><\/a><\/span>, highlights this growing threat and sheds light on a potential game-changer: <strong>stylometric detection<\/strong>. By analysing the unique &#8220;writing fingerprint&#8221; of AI versus human-generated content, this technique offers a promising way forward in the fight against smarter phishing attacks.<\/p>\n<h3><strong>Key Findings: How Email Providers Stack Up<\/strong><\/h3>\n<p>In the paper, we tested 63 AI-generated phishing emails (created using GPT-4) across three major platforms:<\/p>\n<ul>\n<li><strong>Yahoo<\/strong>\u00a0blocked\u00a0<strong>90%<\/strong>\u00a0of phishing attempts, showcasing robust filtering.<\/li>\n<li><strong>Gmail<\/strong>\u00a0allowed\u00a0<strong>86%<\/strong>\u00a0of malicious emails to bypass its spam filters.<\/li>\n<li><strong>Outlook<\/strong>\u00a0performed the weakest, letting\u00a0<strong>96%<\/strong>\u00a0of phishing content through.<\/li>\n<\/ul>\n<p>Even more alarming? When researchers sent\u00a0<em>legitimate<\/em>\u00a0AI-generated emails:<\/p>\n<ul>\n<li>Yahoo falsely flagged\u00a0<strong>58\u201366%<\/strong>\u00a0as spam.<\/li>\n<li>Outlook allowed all legitimate emails through, but its permissiveness with phishing attempts raises red flags.<\/li>\n<\/ul>\n<p><strong>The takeaway<\/strong>: We observed that current filters prioritise minimizing false positives (legit emails marked as spam) at the cost of security, a risky trade-off as AI phishing evolves.<\/p>\n<h3><strong>The Stylometric Solution: Catching Phishing by Language Patterns<\/strong><\/h3>\n<p>To combat AI-generated threats, the study introduced\u00a0<strong>47 new stylometric features,<\/strong> linguistic markers that analyse writing style, structure, and tone. These include:<\/p>\n<ul>\n<li><strong>Imperative verbs<\/strong>\u00a0(e.g., \u201cclick,\u201d \u201cverify\u201d) driving urgency.<\/li>\n<li><strong>Clause density<\/strong>\u00a0measures sentence complexity.<\/li>\n<li><strong>Pronoun usage<\/strong>\u00a0(e.g., overuse of \u201cwe\u201d or \u201cyou\u201d) to mimic authority.<\/li>\n<\/ul>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-6 size-full\" src=\"https:\/\/blogs.tees.ac.uk\/chiopara\/files\/2025\/04\/features_extraction.png\" alt=\"\" width=\"1297\" height=\"537\" srcset=\"https:\/\/blogs.tees.ac.uk\/chiopara\/files\/2025\/04\/features_extraction.png 1297w, https:\/\/blogs.tees.ac.uk\/chiopara\/files\/2025\/04\/features_extraction-300x124.png 300w, https:\/\/blogs.tees.ac.uk\/chiopara\/files\/2025\/04\/features_extraction-1024x424.png 1024w, https:\/\/blogs.tees.ac.uk\/chiopara\/files\/2025\/04\/features_extraction-768x318.png 768w, https:\/\/blogs.tees.ac.uk\/chiopara\/files\/2025\/04\/features_extraction-1200x497.png 1200w\" sizes=\"(max-width: 1297px) 100vw, 1297px\" \/><\/p>\n<p>When tested on machine learning models:<\/p>\n<ul>\n<li><strong>XGBoost<\/strong>\u00a0outperformed others with\u00a0<strong>96% accuracy<\/strong>\u00a0and a near-perfect\u00a0<strong>99% AUC score<\/strong>.<\/li>\n<li><strong>Urgency markers<\/strong>\u00a0and\u00a0<strong>sentence complexity<\/strong>\u00a0were critical in flagging phishing content.<\/li>\n<\/ul>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-7 size-full\" src=\"https:\/\/blogs.tees.ac.uk\/chiopara\/files\/2025\/04\/Top_10_Features_2.png\" alt=\"\" width=\"1984\" height=\"1180\" srcset=\"https:\/\/blogs.tees.ac.uk\/chiopara\/files\/2025\/04\/Top_10_Features_2.png 1984w, https:\/\/blogs.tees.ac.uk\/chiopara\/files\/2025\/04\/Top_10_Features_2-300x178.png 300w, https:\/\/blogs.tees.ac.uk\/chiopara\/files\/2025\/04\/Top_10_Features_2-1024x609.png 1024w, https:\/\/blogs.tees.ac.uk\/chiopara\/files\/2025\/04\/Top_10_Features_2-768x457.png 768w, https:\/\/blogs.tees.ac.uk\/chiopara\/files\/2025\/04\/Top_10_Features_2-1536x914.png 1536w, https:\/\/blogs.tees.ac.uk\/chiopara\/files\/2025\/04\/Top_10_Features_2-1200x714.png 1200w, https:\/\/blogs.tees.ac.uk\/chiopara\/files\/2025\/04\/Top_10_Features_2-1980x1178.png 1980w\" sizes=\"(max-width: 1984px) 100vw, 1984px\" \/><\/p>\n<h3><strong>Why Stylometrics Matter<\/strong><\/h3>\n<p>Traditional phishing detection relies on external signals like suspicious links. But AI-generated emails often omit these, relying instead on psychological manipulation. Stylometrics offers a\u00a0<em>text-based defense<\/em>:<\/p>\n<ul>\n<li>Detects\u00a0<strong>zero-day attacks<\/strong>\u00a0lacking known malicious links.<\/li>\n<li>Provides\u00a0<strong>transparent insights<\/strong>\u00a0(unlike \u201cblack-box\u201d AI models).<\/li>\n<li>Complements existing tools for multi-layered security.<\/li>\n<\/ul>\n<h3><strong>Limitations and Future Directions<\/strong><\/h3>\n<ul>\n<li><strong>Small dataset<\/strong>: Only 63 phishing\/legitimate emails were tested.<\/li>\n<li><strong>Provider bias<\/strong>: Results may not generalize to all email services.<\/li>\n<li><strong>Model focus<\/strong>: GPT-4 was the sole LLM used; future work could explore Claude, Gemini, or Llama.<\/li>\n<\/ul>\n<h3><strong>Explore the Research<\/strong>:<\/h3>\n<ul>\n<li>Read the full paper:\u00a0<a href=\"https:\/\/doi.org\/10.1016\/j.eswa.2025.127044\">Expert Systems with Applications, Volume 276<\/a><\/li>\n<\/ul>\n<p>Authors: <a href=\"https:\/\/research.tees.ac.uk\/en\/persons\/chi-opara\">Chidimma (Chi) Opara<\/a>, <a href=\"https:\/\/research.tees.ac.uk\/en\/persons\/paolo-modesti\">Paolo Modesti<\/a> and <a href=\"https:\/\/research.tees.ac.uk\/en\/persons\/lewis-golightly\">Lewis Golightly<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>The Rise of AI-Generated Phishing Large Language Models (LLMs) like GPT-4 are transforming how we communicate, but not always for the better. While these tools can streamline everything from writing emails to coding, they\u2019re also being misused by cybercriminals to craft phishing emails that are alarmingly convincing. These AI-generated messages mirror natural human language so [&hellip;]<\/p>\n","protected":false},"author":25362,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":"","_jetpack_memberships_contains_paid_content":false},"categories":[1],"tags":[],"jetpack_sharing_enabled":true,"jetpack_featured_media_url":"","_links":{"self":[{"href":"https:\/\/blogs.tees.ac.uk\/chiopara\/wp-json\/wp\/v2\/posts\/1"}],"collection":[{"href":"https:\/\/blogs.tees.ac.uk\/chiopara\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.tees.ac.uk\/chiopara\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.tees.ac.uk\/chiopara\/wp-json\/wp\/v2\/users\/25362"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.tees.ac.uk\/chiopara\/wp-json\/wp\/v2\/comments?post=1"}],"version-history":[{"count":2,"href":"https:\/\/blogs.tees.ac.uk\/chiopara\/wp-json\/wp\/v2\/posts\/1\/revisions"}],"predecessor-version":[{"id":352,"href":"https:\/\/blogs.tees.ac.uk\/chiopara\/wp-json\/wp\/v2\/posts\/1\/revisions\/352"}],"wp:attachment":[{"href":"https:\/\/blogs.tees.ac.uk\/chiopara\/wp-json\/wp\/v2\/media?parent=1"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.tees.ac.uk\/chiopara\/wp-json\/wp\/v2\/categories?post=1"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.tees.ac.uk\/chiopara\/wp-json\/wp\/v2\/tags?post=1"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}