
Abdullah Khan
AI Architect: 5 years, Enterprise RAG Systems, Agents and AWS MLOps
Kompetenzen

Meine Dienstleistungen


Portfolio
Arbeitserfahrung
AI Engineer
Devsinc
Nov 2025 - Present • 6 mos
Architected and scaled a modular AI Agent Orchestration Platform designed to automate complex B2B workflows, reducing operational overhead by 40% and enabling enterprise clients to deploy custom autonomous agents at scale. • Designed a high-performance RAG (Retrieval-Augmented Generation) Pipeline utilising Python (FastAPI) and Vector Databases (Pinecone/Weaviate). Implemented advanced retrieval strategies including hybrid search (semantic + keyword), re-ranking models (Cohere), and query expansion to ensure 95%+ accuracy in context retrieval for domain-specific knowledge bases. • Engineered a Multi-Agent Workflow Engine using LangGraph and CrewAI, enabling collaborative task execution between specialised agents. Implemented state management and "Long-term Memory" modules using Redis, allowing agents to maintain context across multi-session user interactions. • Optimised LLM Inference and Cost Management by implementing a dynamic routing layer that selects models (GPT-4o, Claude 3.5 Sonnet, or Llama 3) based on task complexity and token cost. Developed a custom Semantic Caching layer that reduced API latency by 60% for repetitive queries. • Built a "Human-in-the-Loop" (HITL) Intervention UI, integrating with Slack and Microsoft Teams. The system pushes high-uncertainty agent decisions to human supervisors for real-time correction, using the feedback to fine-tune local models and improve future autonomous performance. • Implemented Full-Stack Observability and evaluation frameworks using LangSmith and Weights & Biases. Established automated "eval" suites to test for hallucinations, prompt injections, and regression across model updates, ensuring production-grade reliability for enterprise deployments. • Developed and Deployed Scalable Infrastructure on AWS (EKS, Lambda, SQS) using Terraform (IaC). Leveraged Docker and Kubernetes to manage auto-scaling GPU workloads for local model fine-tuning (LoRA/QLoRA) and high-concurrency inference tasks.
AI Engineer
Prypco
Oct 2023 - Oct 2025 • 2 yrs
Architected and deployed an end-to-end AI Document Intelligence Platform for mortgage processing, reducing manual data entry and enabling real-time validation feedback loops between Mortgage Advisors (MAs) and customers. • Designed a high-performance, asynchronous validation engine using Python (FastAPI) and Azure Document Intelligence (OCR) to process complex financial documents (Passports, Bank Statements, Salary Certificates). Implemented structured data extraction using OpenAI (GPT-5.2) with strict Pydantic schema validation to ensure data integrity. • Engineered a multi-channel orchestration layer integrating WhatsApp and Slack. Built a "Human-in-the- Loop" workflow where: - Customers are onboarded and nudged via WhatsApp to upload documents. - AI extraction results and validation issues are pushed instantly to Slack threads for MA review. - MA verdicts (Approve/Reject) trigger automated feedback to the customer portal, streamlining the re- upload process for rejected documents. • Implemented complex domain logic and cross-document validation, enabling the system to automatically detect inconsistencies across file sets (e.g., matching salary credits in bank statements against salary certificates, verifying residency status across Visa/Passport/EID) before the case reaches the Credit Analyst. • Integrated full-stack observability using Langfuse, creating a granular tracing system that tracks the lifecycle of every document. Developed custom dashboards to monitor: - AI vs. Human Agreement: Tracking how often MAs override AI verdicts. - Operational Metrics: Latency, total pipeline cost per application, and document rejection rates. • Optimised infrastructure cost and performance by implementing a non-blocking architecture (asyncio) for I/O-bound tasks and thread-pooling for CPU-bound OCR tasks, ensuring the system handles high- concurrency workloads without degradation.
Data Scientist
SutureHealth
Dec 2020 - Jul 2023 • 2 yrs 7 mos
• Designed and implemented a patient-facing care plan generation app, enabling end-to-end workflows from mobile audio recording (between physician and patient) to structured care plan output. • Built and evaluated a multi-model transcription pipeline (AWS Transcribe, HealthScribe, AssemblyAI, Deep-gram Nova3, Medical Transcribe), selecting optimal services for MVP. Implemented two transcription flows: - Batch transcription for downstream NLP AI agent summarisation and structured care plan generation. - Live transcription via WebSockets, incorporating chunking and latency optimisation for near real-time closed captioning. • Developed an agent-based orchestration layer that summarises doctor-patient conversations into predefined structured encounter notes. • Architected and deployed backend infrastructure using AWS ECS Fargate, Kinesis, Lambda, SQS, S3, RDS, SNS, Cognito, IAM, and Terraform/CDK for HIPAA-compliant scalability. • Implemented OCR workflows with AWS Textract for extracting structured data from clinical documents. • Delivered a solution for another client to classify medical documents, extract key metadata, and detect bounding boxes for physician signatures and dates, leveraging LLM agents + AWS Textract.