RiskFabric

Rust Python Polars ClickHouse Redpanda Redis Docker License: MIT Deploy mdBook

RiskFabric is a fraud intelligence platform that generates synthetic Indian payment transaction data, processes it through a Medallion ETL pipeline, and produces trained fraud detection models.

✨ Key Features

  • Extreme Throughput: Achieves ~182,000 Transactions Per Second (TPS) using a parallelized "One-Pass" architecture.
  • Agent-Based Realism: Simulates the full lifecycle of Customers, Accounts, and Cards, with behavioral spend profiles driven by real-world heuristics.
  • Geographic Fidelity: Integrates OpenStreetMap (OSM) India data and Uber H3 hexagonal indexing for hyper-realistic spatial spend patterns and location anomalies.
  • Sophisticated Fraud Injection: Includes signatures for UPI Scams, Account Takeover (ATO), Card Not Present (CNP) fraud, and coordinated campaigns.
  • Medallion Data Architecture: A full pipeline taking data from Bronze (Raw) to Silver (Feature Engineered) to Gold (ML-Ready).
  • ML Mastery: Built-in leakage prevention and simulated label noise (False Positives/Negatives) to ensure models are robust and production-ready.

🛠️ Tech Stack

  • Core Engine: Rust (Rayon for parallelization, Rand for deterministic simulation).
  • Real-time Streaming: Redpanda (Kafka-compatible), rdkafka, and Tokio async runtime.
  • Data Processing: Polars 0.51.0 (Lazy API & high-performance transformation).
  • Data Warehouse: PostgreSQL (Spatial/OSM staging), ClickHouse (High-volume transactions), and dbt (Analytical enrichment).
  • Feature Store: Redis (Low-latency state for real-time Z-scores and behavior).
  • Data Ingestion: dlt (Data Load Tool) for MDS integration.
  • Machine Learning: Python (XGBoost) with real-time inference via scorer.py.
  • Infrastructure: Docker/Podman orchestration with Prometheus and Grafana for observability.

📁 Project Structure

🧠 Core Simulation (src/)

  • generators/: Agent-Based Modeling (ABM) logic, entity creation, and fraud mutation engines.
  • models/: Rust structures for Customers, Accounts, Cards, and Transactions.
  • bin/: CLI binaries for data generation (generate.rs), streaming (stream.rs), and preparation.
  • config.rs: Centralized, type-safe configuration engine for simulation parameters.

🥈 ETL & Data Warehouse (src/etl/ & warehouse/)

  • etl/: Multi-stage Polars transformation pipeline (Silver/Gold feature engineering).
  • warehouse/: dbt project for geographic enrichment and merchant risk profiling using PostGIS.
  • dlt/: MDS integration for automated data lake ingestion.

🤖 Machine Learning (src/ml/)

  • train_xgboost.py: Training pipeline with Feature sanitization and OOT validation.
  • scorer.py: Real-time inference service consuming from Kafka and stateful Redis features.
  • seed_redis.py: Point-in-time state synchronization between the warehouse and feature store.

🛠️ Infrastructure & Docs

  • docker-compose.yml: Orchestrated local stack (ClickHouse, Postgres, Redpanda, Redis, Grafana).
  • documentation/: Arichitectural docs and theory of operation (mdBook).
  • data/config/: Behavioral rules and system tuning YAML configurations.

📈 Benchmarks (150k Txns)

ArchitectureThroughputTotal TimeSpeedup
Sequential Port3,400 TPS48.7s1x
Optimized One-Pass182,000 TPS4.4s53x

Developed by harshafaik