AV simulation for KSA roads: sand storms and Hajj-density scenarios
Why I am writing this now
I have spent the past year in repeated conversations with AV teams working on major Saudi programs — NEOM and THE LINE specifically,1 plus mobility programs at Diriyah Gate4 and several public-transit pilots in Riyadh. The recurring complaint: “We use CARLA, NVIDIA DRIVE Sim, or Applied Intuition, and the off-the-shelf scenarios do not reflect what we actually see on the ground.” The answer was not always obvious, because the simulators themselves are very strong — but their default libraries are weighted toward European and American urban styles.1
The Saudi challenge is different not cosmetically but physically and statistically. Harsh atmospheres on sensors, human densities not seen in any OECD country, dual-script signage, and traditional dress that produces segmentation problems the base models were never trained on.
This post is a map of what is missing, and what each scenario needs in authoring plus labeling. It is not an Annota8 product pitch — it is the practical read on the landscape I would use myself if I were running an AV program in NEOM today.
Scenario one: sand storms — the physical sensor challenge
A sand storm is not just “rain in a different color.” It is a physical environment that affects every sensor in a different way:
LiDAR. Airborne dust particles absorb and scatter beams at both 905 nm and 1550 nm wavelengths.5 Result: severe attenuation of effective range, and false returns from the particles themselves. An AV model trained on clean LiDAR will see “objects” where there are none, or lose a vehicle clearly visible to a human eye at 50 meters. 1550 nm performs somewhat better in dust than 905 nm, but both are affected.5
Cameras. Gradual dust accumulation on the lens over hours of operation.6 This is not a sudden event — it is continuous degradation. The model needs to recognize “the images from this camera are getting blurrier” and reduce confidence in it progressively.
Radar. The least affected sensor, but very high particle density can produce ghost returns.
GPS. During sand storms, GPS performance can degrade via L-band signal attenuation and multipath effects from dense airborne particulates, separate from the geomagnetic-storm ionospheric effects more commonly studied over the Arabian peninsula. A localization model that depends on GPS-only can lose meters of accuracy.
What this scenario needs in simulator authoring. CARLA 0.9.16 (September 2025) added dust-storm weather parameters, so a generic “sand storm” preset is now a primitive7 — but the realism gap remains: regional dust events differ from generic CARLA presets in optical depth and particle-size distribution, and calibration against regional field measurements is still required. You still need an extended pipeline detailing light-scattering properties plus LiDAR attenuation plus a dust-accumulation model on the camera as a function of time. NVIDIA DRIVE Sim is stronger at atmospheric particles by default,8 but the parameter set must be calibrated to a regional sand-storm event, not lifted from a generic global preset.
What it needs in real-data labeling. Images and LiDAR point clouds from actual storms, labeled with: (1) visually estimated storm intensity, (2) effective camera range, (3) LiDAR return quality, (4) objects actually visible to a human eye but missed by sensors. This is foundational labeling work — it does not exist in COCO, nuScenes, or Waymo Open.
Scenario two: Hajj density — pedestrian-traffic interaction
The Hajj season creates a phenomenon that does not exist in any OECD city: zones where pedestrian density reaches 6–9 people per m² at peak hours,2 with continuous vehicle traffic along the edges. Any motion-prediction model trained on San Francisco data will fail here — the assumption of “pedestrians at the signal, empty road within the block” does not hold.
Model challenge. Object detectors that treat pedestrians as isolated cases (single-pedestrian bounding boxes) collapse. The need is for density-based models that treat humans as a continuous field, with group-level tracking and crowd-level behavior prediction rather than individual.
Behavior challenge. Expected pedestrian behavior in a Hajj crowd is qualitatively different from individual pedestrians. The pattern is slow collective motion, sudden stops for entire groups, undefined walking lines, and collective decision-making in interaction with vehicles. Social-Force models need a radical recalibration.
What this scenario needs in simulator authoring. Applied Intuition is strong at authoring interaction scenarios, but default off-the-shelf scenarios typically feature tens, not thousands, of pedestrian agents in a scene. The need is to extend the simulator to support thousands of agents in frame, with crowd-behavior models that match real-world dynamics. That is heavy engineering work — it does not happen through GUI settings.
What it needs in real-data labeling. Field footage from holy-site approach zones, the bus corridors, and the outer Hajj routes. The labeling required: (1) estimated density per region of the scene, (2) individual segmentation at the edges and at vehicle/pedestrian interfaces (where density makes individual labeling possible), (3) group-flow tracking, (4) “collective stop” and “collective movement” events. This is specialized labeling that needs teams trained on high-density scenes.
Note: Saudi authorities are highly sensitive about imagery of the holy sites. Any labeling program for this scenario requires explicit permission, and it is generally better to work on bus and outer-route data rather than direct imagery inside the inner courts.
Scenario three: bilingual signage
Saudi highways display signs in Arabic and English together — that is well-known.3 Less well-known is that the numerals themselves sometimes appear in Eastern Arabic (“Hindi”) form ١٢٣٤٥ on legacy signage and sometimes in Latin form 12345 on newer signage — the 2023 Saudi Highway Code mandates Western Arabic numerals on new signs going forward, which produces a mixed legacy environment across the road network as old and new signs coexist.3 Speed signage varies with the age of the road, too.
Challenge. Scene-text recognition models trained on ICDAR and COCO-Text are almost entirely Latin-text focused. Arabic editions lack the typeface diversity actually present on Saudi roads, and they lack Eastern-Arabic-numeral handling. A vehicle that misreads “speed limit ١٢٠” produces a wrong driving decision.
Added challenge. NEOM, THE HEART OF NEOM, and THE LINE use custom typography that no commercial model has seen. This is the kind of signage that will multiply in volume over the next five years.
What this scenario needs in simulator authoring. CARLA supports asset substitution for signs but assumes Latin text. The need is a trusted Saudi sign-asset library covering official Ministry of Transport signage, plus the new-city signage, plus commercial road signage. The library must include variants for lighting, weathering, and discoloration.
What it needs in real-data labeling. Real driving footage from Riyadh-Jeddah-Dammam-Tabuk routes, labeled with: (1) bounding box for each sign, (2) transcribed Arabic text, (3) transcribed English text, (4) numeral form (Eastern Arabic / Latin), (5) sign type (speed/directional/warning/service). This requires linguistically skilled Arab labelers.
Scenario four: the traditional-dress silhouette — a pedestrian-detection challenge
Pedestrian segmentation models — YOLO, Mask R-CNN, and their successors — are trained on imagery where the majority of pedestrians wear Western dress: trousers, short skirts, fitted coats. The model’s prototype silhouette for “this is a human” expects a head over a torso over two distinguishable separated legs.
The thobe — the long white Saudi garment — produces a continuous vertical silhouette. The legs are not visually separated. White on a light background in daylight is near-zero contrast. The abaya — the long black garment worn by women — produces a solid black silhouette, particularly in low light. The head area is covered, and facial features are not visible to face detectors.
Standard challenge. Published pedestrian-detector fairness research has documented bias by age, skin tone, pose, and view angle.9 By extension, the continuous low-contrast silhouettes produced by thobe and abaya garments are a reasoned hypothesis for further degradation: lower confidence, less accurate bounding boxes, and slower pedestrian recognition relative to the Western-dress prototype the base models were trained on. For autonomous driving, that is the difference between a safe decision and a wrong one — and it is precisely the kind of bias that needs measurement on regional data before it can be claimed as fact.
What this scenario needs in simulator authoring. Pedestrian assets in thobe and abaya within both CARLA and NVIDIA DRIVE Sim. These are not default-available. The library needs to include color variety (white thobe, light blue, gray), abaya length variation, head-covering position, and age range.
What it needs in real-data labeling. Driving imagery from streets in Riyadh, Jeddah, and Al Khobar, with: (1) precise labeling of every pedestrian in traditional dress, (2) lighting conditions (day, night, street-light), (3) background contrast (a pedestrian in white thobe against a white wall is a critical case), (4) typical postures (walking, standing, gathering).
Scenario five: false positives from palm trees and desert
A palm tree has a vertical trunk plus a leaf mass with dark contrast. Object detectors frequently misclassify it — sometimes as a light pole, sometimes as a distant pedestrian, sometimes as an unknown object. On a highway flanked by palm trees on both sides, this becomes a safety issue.
Sandy desert backgrounds lack visual features — low texture. That confuses optical-flow algorithms used in ego-motion estimation. And shadows are long and sharp at sunrise and sunset, creating phantom “objects” on the ground.
What this scenario needs in simulator authoring. A regional vegetation-asset library — palm trees at varying ages and species, athel, desert shrubs — and desert regions with specific time-of-day lighting. This differs physically from Nevada desert in sun angle and atmospheric dust density.
What it needs in real-data labeling. Footage from the Riyadh-Dammam highway and from the outskirts of Mecca at dawn and dusk, with precise labeling of every palm tree and every shadow blob on the ground. Direct benefit: training detectors to reject these patterns as moving objects.
Linking to Saudi mobility programs
NEOM and THE LINE aim to be among the first cities built around fully autonomous mobility, including a NEOM-Pony.ai partnership and the first KSA robotaxi permit.1 Diriyah Gate weaves smart-mobility components into a historic-cultural development program.4 Each of these programs will encounter the five scenarios above in different proportions — NEOM more exposed to desert phenomena (scenario five) and severe sand storms (scenario one), while Riyadh and Jeddah handle high human density during Hajj and Umrah season (scenario two), signage (scenario three), and traditional dress (scenario four).
A model trained on North American data plus the default CARLA scenario library will not be safe in these environments. Not because the model is “bad,” but because the statistical distribution of training data does not include these cases. That is a qualitative gap.
What an AV team actually does
If I were running an AV program in NEOM or in the Saudi Ministry of Transport today, the order I would follow:
-
Bound the regional scenarios to a manageable count. The five above are not exhaustive, but they are a starting point. Add mountain-road scenarios in Asir and high-humidity coastal conditions along the Red Sea, and you have broader coverage.
-
Start targeted field data collection. For each scenario, planned drive hours dedicated to capturing the critical cases. This matters more than buying off-the-shelf commercial datasets — because regional data is thin and low-quality in the commercial market.
-
Deep labeling, not cheap labeling. The most dangerous decision at this stage is to send the work to a general, foreign labeling vendor whose labelers have never seen the region. Result: surface-level labels, missed critical cases, and training data that makes the problem worse rather than better.
-
Simulation authoring in parallel with real data. Simulation alone is insufficient, real data alone is insufficient. The right answer: synthesis. Simulation scenarios calibrated against real data, then model evaluation on held-out real data.
-
An open validation loop with the regulator. AV programs in the Kingdom fall under the Transport General Authority’s oversight, with pilots geofenced and subject to real-time TGA monitoring, plus the Communications, Space and Technology Commission (CST) on communications aspects.10 Early engagement — two years before deployment — avoids regulatory surprises.
The real gap
The gap is not in technical capability. CARLA, NVIDIA DRIVE Sim, and Applied Intuition are powerful tools. The gap is in regional data and asset libraries, and in the existence of labeling teams who are linguistically and culturally able to understand what needs labeling. That gap is bridgeable, but it requires the team to acknowledge that “off-the-shelf scenarios” are not sufficient.
From the Annota8 side, we work today on labeling regional AV use cases for several teams — we do not name them, but the methodology is portable. If you want a working conversation about any one of the five scenarios above, reach out.
Connecting to the rest of the stack
- Our reference methodology at AV simulation and scenario synthesis — the labeling-pipeline pattern Annota8 is designed around for AV validation teams. Specific delivery is scoped per engagement.
- A broader view of automotive and AV solutions at Annota8.
- If you are a KSA smart-city and AV mobility AI lead, this post was written with your role in mind.
- Technical terms used here — the glossary.
A closing note: autonomous vehicles will arrive on Saudi roads. The question is not “when,” but “on what data they trained before they arrived.” Teams investing today in deep regional data will arrive with safe models. Those who assume Waymo Open is enough will discover the difference in the first sand storm.
References
Annota8 is in early-stage operations and does not hold formal compliance certifications. Statements about regulatory approach reflect internal design intent, not certified status. Engage qualified local counsel and advisors for any active procurement or regulatory decision.
Footnotes
-
NEOM, “Mobility sector.” https://www.neom.com/en-us/our-business/sectors/mobility ; NEOM Directory, “Pony.ai secures $100M investment from NEOM.” https://neom.directory/neom-news/autonomous-driving-company-pony-ai-secures-100m-investment-from-neom ; Dosovitskiy et al., “CARLA: An Open Urban Driving Simulator,” 2017. https://arxiv.org/pdf/1711.03938 — NEOM mobility plan relies on autonomous shuttles, robo-taxis, ART, and a $100M Pony.ai investment with the first KSA robotaxi permit; CARLA default towns are largely European/American-styled. ↩ ↩2 ↩3 ↩4 ↩5
-
PMC, “Crowd dynamics and safety at the Hajj,” PMC4078860. https://pmc.ncbi.nlm.nih.gov/articles/PMC4078860/ ; PMC, mass-gathering crowd-density study, PMC7104005. https://pmc.ncbi.nlm.nih.gov/articles/PMC7104005/ — documented densities of 6–8 people/m² near the Kaaba and up to 12 people/m² in peak Tawaf episodes. ↩ ↩2
-
Wikipedia, “Road signs in Saudi Arabia.” https://en.wikipedia.org/wiki/Road_signs_in_Saudi_Arabia — Saudi signage is bilingual Arabic/English per the 2007 Traffic Law and the Vienna Convention; distances historically displayed in Eastern Arabic numerals; the 2023 Saudi Highway Code mandates Western Arabic numerals on new signs going forward, producing a mixed legacy environment. ↩ ↩2 ↩3
-
Diriyah Gate Development Authority. https://www.dgda.gov.sa/en ; Saudi Vision 2030, “Diriyah project.” https://www.vision2030.gov.sa/en/explore/projects/diriyah ; Kakao Corp, “Kakao Mobility MOU with Diriyah.” https://www.kakaocorp.com/page/detail/11752?lang=ENG — Diriyah Gate is a $63.2B giga-project with explicit pedestrianisation, metro integration, smart-mobility plans, and a Kakao Mobility MOU for future mobility services. ↩ ↩2 ↩3
-
Lumimetric, “905nm and 1550nm LiDAR Laser Comparison.” https://www.lumimetric.com/en/new/905nm-and-1550nm-LiDAR-Laser-Comparison.html ; DiVA Portal (Linköping University thesis on LiDAR atmospheric attenuation). https://www.diva-portal.org/smash/get/diva2:1568716/FULLTEXT01.pdf ; EE Times, “What’s the direction for automotive LiDAR — 905 nm or 1550 nm?” https://www.eetimes.com/whats-the-direction-for-automotive-lidar-905-nm-or-1550-nm/ — both wavelengths are scattered by airborne dust; 1550 nm performs better than 905 nm but both are affected. ↩ ↩2
-
Uricar et al., “Let’s Get Dirty: GAN Based Data Augmentation for Camera Lens Soiling Detection in Autonomous Driving” (Valeo), 2019. https://arxiv.org/abs/1912.02249 ; arXiv preprint on lens soiling for AV cameras. https://arxiv.org/pdf/1911.01054 — lens soiling (mud, dust, water) treated as a primary, continuous degradation mode requiring detection and cleaning subsystems. ↩
-
CARLA Team, “CARLA 0.9.16 Release Notes,” September 16, 2025. https://carla.org/2025/09/16/release-0.9.16/ — confirms dust-storm weather parameters and post-process effects for dusty weather; dust storm parameter switched from bool to float. ↩
-
NVIDIA, “Autonomous Vehicle Simulation (DRIVE Sim).” https://www.nvidia.com/en-us/use-cases/autonomous-vehicle-simulation/ ; The Robot Report, “How AV developers use virtual driving simulations to stress test adverse weather.” https://www.therobotreport.com/how-av-developers-use-virtual-driving-simulations-stress-test-adverse-weather/ — DRIVE Sim recreates rain, dust, and snow down to particle size, droplet shape, and distribution. ↩
-
Wilson, Hoffman, and Morgenstern, “Predictive Inequity in Object Detection,” 2019. https://ar5iv.labs.arxiv.org/html/1906.10490 ; Li et al., “Bias Behind the Wheel: Fairness Analysis of Autonomous Driving Systems,” 2023. https://arxiv.org/pdf/2308.02935 — pedestrian-detection fairness research documenting bias by age, skin tone, pose, and view angle. ↩
-
Gibson Dunn, “Regulating Autonomous Vehicles in Saudi Arabia.” https://www.gibsondunn.com/regulating-autonomous-vehicles-in-saudi-arabia/ ; Transport General Authority. https://www.tga.gov.sa/en ; Al Tamimi & Co., “Driving into the future: regulatory updates on AVs in KSA.” https://www.tamimi.com/law-update-articles/driving-into-the-future-regulatory-updates-on-avs-in-the-kingdom-of-saudi-arabia/ — TGA is the competent regulator for AV pilots and on-road deployment in KSA; pilots are geofenced and require real-time TGA monitoring; CST handles communications aspects. ↩