I will provide aiops and sre consulting for devops and cloud reliability
GPU Infrastructure LLMOps Engineer NVIDIA Kubernetes Neo Cloud
Über diesen Service
Are you shipping LLM products but struggling with GPU infrastructure, scaling, and reliability? I help teams build production-grade GPU platforms end-to-end.
What you get: Neo cloud GPU setup and cluster hardening Kubernetes GPU scheduling and autoscaling for LLM training and inference (vLLM/Ollama/Triton) MLOps/LLMOps CI/CD for models and data pipelines GPU monitoring and alerts using NVIDIA DCGM + Prometheus + Grafana Cost optimization, capacity planning, and observability best practices
Deliverables can include architecture review, deployment plan, and hands-on implementation depending on package tier.
Tools:
Docker
•
GitLab
•
Jenkins
•
GitHub
•
CircleCI
Frameworks:
Terraform
•
Ansible
Programmiersprache:
Bash
•
Python
•
Golang
Expertise:
Installation
•
Migration
•
Konfiguration
