Why most Arabic chatbots will fail compliance in 2026
Context: deployment wave without a compliance funnel
What happened in MENA during 2024-2025: every major bank, telco, ministry, insurer, and large hospital announced an “AI chat assistant.” Press releases piled up. POCs on the ChatGPT API, Claude API, Gemini API, ALLaM (SDAIA, launched May 2024 on watsonx and Azure)1, Falcon (TII, UAE), and Jais (Inception/G42 + MBZUAI) moved into production. Local and regional vendors sold “ready-made solutions” without serious testing on dialect, fiqh, or data privacy.
In 2026 the bill comes due. PDPL in Saudi Arabia entered full enforcement on 14 September 20242, followed by a SDAIA public consultation on proposed amendments to the Implementing Regulations that closed on 27 May 20253 (see our detailed PDPL read-through). The UAE, Qatar, and Bahrain are tracking. SAMA folds AI governance into its existing Cyber Security Framework and consumer-protection oversight rather than a standalone AI-specific instrument4. CST (formerly CITC, renamed under Cabinet Decree No. 235 of 1444 H)5 now oversees telecom and digital infrastructure. The compliance officer who signed off the chatbot launch in 2024 with no DPIA now finds themselves on the other side of a hard internal review.
This piece is a diagnosis, not a sales deck. I am writing from hands-on work with NLP production teams across the region — I have seen the same patterns repeat. The six failure modes below are not hypothetical; they are what any serious review of an Arabic chatbot in production in MENA today will find. For the operating view of PDPL, see our designed around PDPL principles compliance guide.
Reason 1: conversation logs treated as analytics data, not personal data
This is the largest compliance trap I see in the region. The engineering team assumes the chat transcript is “product content” and parks it in a data warehouse with everything else. The compliance officer assumes — correctly — that it is personal data governed by PDPL.
Under PDPL the rule is unambiguous: any data that identifies a data subject, or could identify them by linkage — including chat text containing a name, an account number, a phone number, a health complaint, a legal question — is personal data. A chatbot log from an Islamic bank covering a murabaha financing enquiry is financially sensitive personal data, which means you owe the subject:
- A lawful basis for processing — explicit consent, contract, legal obligation, or legitimate interest backed by a DPIA
- A retention policy — defined duration, automatic deletion
- A right of access — the customer asks for their transcript, you must deliver inside 30 days (extendable by up to 30 further days under defined conditions)6
- A right of erasure — the customer asks to delete it, you must execute (with the usual statutory carve-outs)
- Cross-border transfer controls — if logs sit on servers outside the Kingdom, a risk assessment plus an approved transfer mechanism (SCCs, BCRs, adequacy, or explicit consent)7
- A DPO — mandatory for public entities, controllers whose core activities involve regular monitoring at scale, and controllers whose core activities involve sensitive personal data8
What actually happens: the foreign chatbot vendor retains every conversation on Azure US-East or GCP us-central1 for fine-tuning. No one asked. No one signed a DPIA. When the PDPL inspector arrives, that is a clear-cut violation. For institutions in telecom, see PDPL’s impact on telecom AI deployments.
Reason 2: Sharia and religious overreach
General-purpose models — GPT-4, Claude, Gemini — are trained on broad Arabic corpora that include religious text, but none of the major vendors publishes a per-madhhab capability disclosure, evaluation benchmark, or guarantee that the model can distinguish between:
- The major Sunni schools of jurisprudence (Hanafi, Maliki, Shafi’i, Hanbali), the Twelver and Zaydi Shia traditions, the Ibadi tradition (notably present in Oman and the Mzab valley in Algeria), and the Druze community in the Levant — each carrying its own jurisprudential framework
- A fatwa from Dar Al-Ifta Egypt versus the Saudi Council of Senior Scholars versus the UAE Fatwa Council versus the relevant Marja’ for a Shia customer
- An accredited fatwa versus a personal opinion from a sheikh on YouTube
- Qur’anic text versus hadith versus juristic reasoning
The customer asks, “Is this loan halal?” Without madhhab-aware grounding, the chatbot can synthesise an answer that draws across schools without flagging which one applies. If the chatbot belongs to an Islamic bank, that is a violation of the in-house Sharia board’s authority — the chatbot is not entitled to issue a fiqh ruling without sign-off. If the question concerns women, inheritance, marriage, or divorce, the answer can wound a customer and expose the institution to social fallout.
(I am not a Sharia scholar. I write this as an operator describing the engineering boundary; the claim above is observational from operator audits, not from a published vendor benchmark. Authoritative rulings belong to qualified scholars.)
The pattern that fixes this is not “improve the model” — it is a clean cutout that removes fiqh and religion from the chatbot entirely and routes such questions to a vetted human path. See the boundary line for AI in Islamic finance for the deeper read.
Reason 3: dialect mismatch
The Egyptian customer types: “محتاج أعرف رصيدي” (I need to know my balance). The chatbot — trained on MSA plus a heavy Saudi corpus — answers: “حضرتك تفضل بتزويدنا برقم البطاقة الوطنية” with a register that lands wrong. The customer disengages, calls the contact center, complains on X, tweets “this bank’s AI doesn’t get anything.” MarComms scrambles to contain it. NPS drops.
The worst part: nobody is measuring this automatically. The vendor reports “successful conversation closure” — because the customer closed the window — as a success metric. The reality is that they closed it because they failed to parse the reply.
The number of production-relevant Arabic dialects in MENA is at least seven: Egyptian, Gulf (Saudi, Emirati, Kuwaiti, Qatari, Bahraini), Levantine (Syrian, Lebanese, Jordanian, Palestinian), Iraqi, Maghrebi (Moroccan, Algerian, Tunisian), Yemeni, Sudanese9. Each carries its own banking, medical, and legal lexicon. The general model does not handle that with production-grade quality. See our Arabic LLM commercial failure diagnosis.
Reason 4: hallucinated advice in regulated sectors
Banking, medicine, law, insurance, pharma. From operator audits across 2024-2025, Arabic chatbots in production routinely produce answers that fall into one of three failure categories:
- Material factual error — a fictitious account number, an interest rate the bank does not actually offer, a medical diagnosis with no exam, a legal citation to a statute that does not exist
- Out-of-scope advice — saying “I recommend you take drug X” when the only correct response is “see a physician”
- Unauthorized commitment on behalf of the institution — “we agree to waive your fee” with no authority to do so
Each of these moves toward regulator action. SAMA asks the bank why the chatbot quoted an unadvertised rate. SFDA, ZATCA, and the Egyptian Ministry of Health ask about automated medical referrals. The Bar Association asks about hallucinated legal citations.
The guardrail that fixes this is not “pick a more accurate model” — it is a whitelist of permitted topics, a rejection model that refuses everything outside it, and a human escalation path for every rejection.
Reason 5: no audit trail for automated decisions
Under PDPL, fully automated decision-making is a flagged high-risk processing activity: it requires explicit consent and, where the decision produces significant effects on the data subject, additional safeguards and a DPIA10. If the chatbot refuses a loan, closes an account, declines an insurance claim, the institution must retain:
- The full record of what the customer said
- The full record of what the model replied
- A copy of the specific model version used
- A copy of the system prompt and the guardrails active at that moment
- The escalation path that was taken
- A human signature on the final decision
What I see in practice: the vendor keeps text-only logs, with no version control on the model, no snapshot of the prompt, and no link to the final decision record. When a PDPL auditor arrives and asks, “Show me the decision the chatbot made on date X for customer Y, and how it got there,” there is no answer. That is a direct violation.
Reason 6: missing or cosmetic DPIA
Article 22 of the PDPL Implementing Regulations mandates a DPIA for high-risk processing, particularly where processing involves sensitive data, automated decisions, or large-scale processing tied to products and services11. A chatbot handling financial, health, and legal data for thousands of customers daily is, by definition, high-risk processing. Yet every chatbot I have audited either has no DPIA at all, or a “paper” DPIA the legal team threw together in a single day before launch to close a ticket.
A real DPIA requires:
- A processing description — data ingested, data stored, third parties with access
- Necessity and proportionality assessment — do you need all this data? Is a less intrusive alternative available?
- Risk assessment to the data subject — leakage, discrimination, error, unintended outcomes
- Mitigation measures — encryption, time limits, deletion, human review
- DPO consultation — formal sign-off
Nobody does this for the chatbot. When the PDPL review lands, the inspector takes the paper DPIA and rejects it.
What institutions should do: a test rubric
The fix. If you are a compliance officer, head of digital, or CTO at a bank, telco, insurer, or hospital in MENA, this checklist is the floor before launching any Arabic chatbot in 2026.
Before launch
- Full DPIA — signed by the DPO, reviewed by legal, addressing all six failure modes above
- Data map — where each conversation is stored, for how long, who can access it, across which borders
- Vendor DPA — committing to data subject rights, breach notification, audit rights
- Topic whitelist — approved by business, compliance, and the Sharia board (if Islamic)
- Rejection model — trained to refuse everything outside the whitelist
- Escalation path — explicit, with an SLA, to a real human
Pre-launch testing
- Dialect test — at least 7 dialects, 200 questions each, human-reviewed
- Fiqh test — for Islamic finance, 100 fiqh questions reviewed with the Sharia board
- Hallucination test — 200 out-of-scope questions, measure the correct-rejection rate
- Privacy test — attempt to extract a previous customer’s data from the model
- Escalation test — measure handover time to humans in critical cases
After launch
- Daily monitoring — sample 1% of conversations, human review for quality
- Monthly pattern review — any new topic surfacing, any complaints recurring
- Quarterly retraining — on dialect data and real human responses
- Audit trail retained — every decision, every model version, every prompt
Who has to do what
Compliance officer — enforce a real DPIA, refuse launch if it is incomplete, retain a copy of the vendor DPA.
CTO — enforce version control on the model and the prompt, enforce complete audit logging, enforce human testing before every deployment.
Head of digital — enforce a strict topic whitelist, enforce a real human escalation path with an SLA.
Head of contact center — enforce serious dialect testing, link chatbot KPIs to real NPS rather than “conversation closure.” See our contact center solutions and the persona dossier for MENA contact-center AI leads.
Sharia board (if applicable) — approve the topic whitelist, enforce a clean cutout for anything outside it.
A note on human-in-the-loop
Everything above assumes the institution has a human team capable of doing the review. That is where many institutions fall short: they do not have reviewers with enough linguistic background to judge quality, or enough fiqh, medical, or legal background to judge correctness. That gap is what we fill — a QA team specialized in Arabic NLP, with a leadership layer of Cairo PhD-linguists, reviewing 1% of production conversations weekly and surfacing the patterns before the compliance officer does.
It is not a service of “always bigger.” It is a service of “always honest.”
Closing note
If you deploy an Arabic chatbot in 2026 without passing the rubric above, you are not deploying a product — you are deploying a regulatory exposure. The next 18 months will sort the institutions that invested in governance before launch from the ones that accepted a “ready POC” from a vendor with no questions asked. The difference will be visible in the headlines.
We do not sell chatbots. We sell the QA layer, the human review, and the drift detection for the people who do. If you are in that seat, read the appendix below before launch. For terminology, see the glossary.
References
Annota8 is in early-stage operations and does not hold formal compliance certifications. Statements about regulatory approach reflect internal design intent, not certified status. Engage qualified local counsel and advisors for any active procurement or regulatory decision.
Footnotes
-
IBM MEA newsroom, “SDAIA launches ALLaM on watsonx” — ALLaM publicly launched May 2024 by SDAIA in three sizes (7B, 13B, 70B). https://mea.newsroom.ibm.com/sdaia-launches-allam-on-watsonx Also: Microsoft Tech Community, “Introducing SDAIA and their latest Arabic LLM on Azure AI Model Catalog.” https://techcommunity.microsoft.com/blog/azure-ai-foundry-blog/introducing-sdaia-and-their-latest-arabic-llm-on-azure-ai-model-catalog/4240241 ↩
-
Morgan Lewis, “Saudi Arabia Personal Data Protection Law: Transition Period Ends September 14” (2024). https://www.morganlewis.com/pubs/2024/09/saudi-arabia-personal-data-protection-law-transition-period-ends-september-14 — confirms full PDPL enforceability commenced 14 September 2024 under SDAIA supervision. Corroborated by Clyde & Co, “Saudi Arabia’s Personal Data Protection Law becomes enforceable” (2024). https://www.clydeco.com/en/insights/2024/09/saudi-arabia-s-personal-data-protection-law-become ↩
-
Clyde & Co, “Saudi Arabia Personal Data Protection Law: Third public consultation” (2025). https://www.clydeco.com/en/insights/2025/05/saudi-arabia-new-pdp-law-consultation — SDAIA opened public consultation on proposed amendments to the Implementing Regulations with feedback window closing 27 May 2025. See also Securiti, “Key Proposed Updates to Saudi Arabia’s PDPL Implementing Regulations.” https://securiti.ai/key-proposed-updates-to-saudi-arabia-pdpl-implementing-regulations/ ↩
-
Wattlecorp, “AI Security Risks in Saudi Banking” — notes SAMA folds AI governance into the existing Cyber Security Framework, Business Continuity Framework, and consumer-protection circulars rather than a standalone AI standard. https://www.wattlecorp.com/ai-security-risks-in-saudi-banking-sama/ Also: PrivacyPulse, “How Saudi Fintech & Banks Align with PDPL and SAMA in 2025.” https://privacypulse.co/how-saudi-fintech-banks-align-with-pdpl-and-sama/ ↩
-
CST official press release confirming the CITC-to-CST rename under Cabinet Decree No. 235 dated 07/04/1444 H. https://www.cst.gov.sa/en/mediacenter/pressreleases/Pages/20201018.aspx and https://www.cst.gov.sa/en ↩
-
Akin, “Kingdom of Saudi Arabia’s New Personal Data Protection Law and Implementing Regulations: Key Obligations, Responsibilities and Rights” — controllers must respond to access requests within 30 days, extendable by up to 30 further days for unusual effort or multiple requests from the same subject. https://www.akingump.com/en/insights/alerts/kingdom-of-saudi-arabias-new-personal-data-protection-law-and-implementing-regulations-key-obligations-responsibilities-and-rights ↩
-
SDAIA, “Regulation on Personal Data Transfer Outside the Kingdom.” https://dgp.sdaia.gov.sa/wps/wcm/connect/e5bbede0-1119-4f70-b4ef-f043ce58d780/Regulation+on+Personal+Data+Transfer+Outside+the+Kingdom..pdf — risk assessment required for continuous or large-scale cross-border transfers of sensitive personal data, plus approved transfer mechanism. Analysis: King & Spalding, “International Personal Data Transfers under Saudi Arabia’s Data Protection Law.” https://www.kslaw.com/news-and-insights/international-personal-data-transfers-under-saudi-arabias-data-protection-law ↩
-
Baker McKenzie Global Data and Cyber Handbook — Saudi Arabia DPO section. https://resourcehub.bakermckenzie.com/en/resources/global-data-and-cyber-handbook/emea/saudi-arabia/topics/dpos-and-notification-requirements — DPO appointment mandatory for public entities, controllers whose core activities involve regular monitoring of individuals at scale, and controllers whose core activities involve sensitive personal data. See also Securiti summary of SDAIA Rules for Appointing Personal Data Protection Officer. https://securiti.ai/saudi-arabia-rules-for-appointing-personal-data-protection-officer/ ↩
-
Wikipedia, “Varieties of Arabic.” https://en.wikipedia.org/wiki/Varieties_of_Arabic — standard dialectology groupings range from five to seven or more when Yemeni and Sudanese are split out; consistent with the higher-resolution view used by Arabic NLP corpora (MADAR, NADI shared tasks). See also Discover Discomfort. https://discoverdiscomfort.com/arabic-dialects-maghrebi-egyptian-levantine-gulf-hejazi-msa/ ↩
-
Securiti, “Saudi Arabia Personal Data Protection Law.” https://securiti.ai/saudi-arabia-personal-data-protection-law/ — PDPL treats automated decision-making as a flagged processing activity: explicit consent is required for automated decisions, and Implementing Regulations require additional safeguards or DPIA where automated decisions have significant effects. Corroborated by the Akin alert in 6. ↩
-
KSAPDPL.COM, “Article 22 of the PDPL — Mandatory Data Impact Assessments (DPIA).” https://ksapdpl.com/ksa-saudi-pdpl-article-22-mandatory-data-impact-assessments-dpia/ — Article 22 of the Implementing Regulations imposes a mandatory DPIA for processing tied to products/services, particularly for sensitive data and automated decisions. See also Wattlecorp DPIA implementation guide. https://www.wattlecorp.com/pdpl-saudi-arabia-dpia-implementation-guide/ ↩