Deploy & Monitor ML Models: Manage the deployment of LLM and chatbot models using Kubernetes, Docker, and cloud-native tools across AWS or GCP.
Model Orchestration: Implement ML pipelines using Airflow, Kubeflow, or MLFlow for training, serving, and monitoring workflows.
CI/CD for ML Workflows: Set up Git-based CI/CD pipelines using Jenkins or similar tools to ensure reliable deployments and rollbacks.
Observability & Monitoring: Implement logging, tracing, and performance monitoring stacks (e.g., Grafana, Prometheus, ELK).
Cost Optimization: Continuously optimize resource utilization, scaling strategies, and deployment costs across cloud environments.
API & Traffic Management: Configure API gateways (Kong) and load balancing solutions (Kafka, Zookeeper) to ensure reliable and secure model serving.
Automation & Scripting: Develop automation scripts and operational dashboards for model health, inference latency, and error tracking.
Microservices Architecture: Design and manage microservices-based ML systems for modular, scalable deployments.
Data Management: Ensure integration with Postgres DB for metadata storage, logs, and system configuration.
JOB REQUIREMENTS
Education
Bachelor's degree in Computer Science, Data Engineering, AI/ML, or related fields.
Experience
1 - 3 years of experience in MLOps, DevOps, or Cloud Engineering with ML-focused workflows.
Experience working with LLM-based ML models is preferred but not mandatory.
Technical Skills