I will provide aiops and sre consulting for devops and cloud reliability

Einige Informationen werden in englischer Sprache angezeigt.

Vereinigte Staaten

Ich spreche Englisch

GPU Infrastructure LLMOps Engineer NVIDIA Kubernetes Neo Cloud

I build scalable NVIDIA GPU infrastructure for AI training and inference. I specialize in Kubernetes GPU clusters, LLM training/inference, and GPU observability. Services: • GPU cluster setup • Kube...
Über diesen Service

Are you shipping LLM products but struggling with GPU infrastructure, scaling, and reliability? I help teams build production-grade GPU platforms end-to-end.

What you get: Neo cloud GPU setup and cluster hardening Kubernetes GPU scheduling and autoscaling for LLM training and inference (vLLM/Ollama/Triton) MLOps/LLMOps CI/CD for models and data pipelines GPU monitoring and alerts using NVIDIA DCGM + Prometheus + Grafana Cost optimization, capacity planning, and observability best practices

Deliverables can include architecture review, deployment plan, and hands-on implementation depending on package tier.

Tools:

Docker

GitLab

Jenkins

GitHub

CircleCI

Frameworks:

Terraform

Ansible

Cloud-Provider:

Amazon Web Services

microsoft azure

Programmiersprache:

Bash

Python

Golang

Expertise:

Installation

Migration

Konfiguration