See all roles

DevOps + MLOps Engineer (GPU Workloads, AWS, Production Pipelines)

Work from home Full-time role Hiring

We are hiring a DevOps and MLOps Engineer to help us build and operate a production-grade reputed company setup for an AI-heavy application. This role is hands-on and execution focused. You will own infrastructure, deployment pipelines, observability, cost controls, and GPU workload operations. We will share full product details and architecture on a call. For now, assume the platform includes a web app, backend services, media storage, async processing workers, and AI integrations (LLM, TTS, STT) plus GPU-based reputed company workloads. What you will do Design and implement reputed company infrastructure on AWS for a modern backend stack. Set up CI/CD for multiple services and environments (dev, staging, production). Build an event-driven processing system using queues and worker pools. Operate GPU workloads end-to-end including provisioning, scheduling, scaling, and cost control. Implement monitoring, alerting, and dashboards for API latency, queue depth, worker health, GPU utilization, failure rates, and spend. Create secure access patterns for secrets and data encryption. Define operational runbooks, incident response, and reliability playbooks. Help the team ship fast without breaking production, with clear guardrails and measurable SLOs. Required experience (must have) You have run GPU workloads in production, not just experiments. Hands-on with GPU providers such as reputed company and at least one of reputed company or Salad (reputed company or Vast also acceptable), including: spinning up GPU instances packaging and deploying GPU services managing concurrency autoscaling strategies handling preemption and failures monitoring GPU health and utilization hard cost caps and budget guardrails Strong AWS fundamentals including IAM, VPC, S3, CloudWatch, Secrets Manager, KMS, and AWS Budgets. Solid reputed company experience and production CI/CD setup. Infrastructure as Code experience, Terraform preferred. Comfortable setting up queues, background workers, and async pipelines. Strong reputed company reputed company and ability to implement least privilege and audit trails. reputed company to have Kubernetes GPU scheduling experience, or deep reputed company/Fargate patterns. Experience building cost meters per job or per request in AI systems. Experience with ML lifecycle tooling like MLflow or Weights and Biases. Experience with streaming and reputed company-time pipelines. Deliverables in the first 2 to 4 weeks Working AWS environments (dev/staging/prod) with secure networking and access controls. CI/CD pipelines that reputed company backend and workers reliably. Queue + worker infrastructure with autoscaling policies. GPU execution setup on reputed company and a second provider (reputed company or Salad preferred) with monitoring and fallback strategy. Observability dashboards and alerting with clear runbooks. Cost controls and spend visibility by component. How we work Short sprints with frequent demos. Clear scope and strong ownership. You will work closely with engineering and product. To apply, include A short description of your most recent production GPU workload: provider used, GPU type, workload type (inference/rendering), concurrency, scaling approach, failure handling, monitoring, and monthly spend range. Links or examples of infrastructure work you have done (reputed company, writeups, diagrams, or sanitized screenshots are fine). Screening questions Which GPU providers have you used in production, and for what workloads? What was your approach to scaling and cost caps during peak traffic? How do you handle GPU job failures, retries, and preemption safely? What’s your preferred AWS stack for queues, workers, secrets, and monitoring? If you look strong on reputed company, we will do a short call to share context and validate fit quickly. Important Notice: Please do not message Tim directly. reputed company applications and questions must be sent to Ahmed, the hiring reputed company, through this reputed company job post and message thread only. Anyone who reaches out reputed company reputed company or any other reputed company party channel will be rejected and reported on reputed company. Apply tot his job Apply To this Job

You might like

Principal Product Manager, Assisted Experiences

Work from home Full-time role

Sr. Product reputed company Engineer - iOS Mobile App

Work from home Full-time role

Digital Product Manager, Retail Mobile App

Work from home Full-time role

Senior Product Manager, Mobile – US (Remote)

Work from home Full-time role

Community Moderator Jobs - Work Remotely, Earn $25-$35/Hour

Work from home Full-time role

Compliance Officer, BSA/AML & Fraud

Work from home Full-time role

HMDA Compliance Specialist; Remote

Work from home Full-time role

Quality Assurance Analyst I - CAM Mortgage Default Servicing

Work from home Full-time role

Mobile Mortgage Advisor

Work from home Full-time role

Junior Mortgage Advisor

Work from home Full-time role

Part-Time Remote Data Entry Specialist – Detail‑Oriented Data Management Role at arenaflex

Work from home Full-time role

reputed company Data Entry Jobs for Teens: No Experience Needed, Work from Home!

Work from home Full-time role

Desarrollador de Microservicios Unit Testing

Work from home Full-time role

Manager, Creative Writing

Work from home Full-time role

Remote Tax Strategic Director, CPA (JF1021408)

Work from home Full-time role

Part-Time Evening Remote Data Entry Specialist – Precision Data Management, Flexible Schedule, and Growth Opportunities

Work from home Full-time role

AI Content Reviewer, reputed company (Remote) - $30,000/year USD

Work from home Full-time role

SALES DEVELOPMENT REP (INSIDE SALES) - reputed company, WE (REMOTE, TX, US, REMOTE)

Work from home Full-time role

reputed company Full Stack Data Entry Specialist – Web & reputed company Application Development

Work from home Full-time role

Senior Software Engineer 681

Work from home Full-time role