|
Описание: |
We’re building production-grade NLP systems and need someone who can take a model from research to reliable, scalable deployment. You’ll own the full lifecycle — from containerisation to live inference endpoints.
What you’ll do
• Package, serve, and monitor small language models on AWS SageMaker Serverless endpoints with optimised cold-start behaviour
• Build slim multi-stage Docker images, push to ECR, and keep inference images under tight size budgets
• Own the build → test → push → deploy CI/CD pipeline for ML services
• Configure IAM roles and manage secrets via AWS Secrets Manager following least-privilege principles
• Version datasets, models, and experiments; instrument latency, throughput, and accuracy in production
• Work with NLP libraries (spaCy, Transformers, FAISS, PyTorch) to build and iterate on NLP pipelines
Requirements:
Cloud & infrastructure:
• Docker — multi-stage builds, image optimisation
• AWS: ECR, IAM roles, Secrets Manager, SageMaker Serverless endpoint configuration
• CI/CD pipelines: build / test / push / deploy for ML services (GitHub Actions or similar)
ML & NLP:
• PyTorch, Hugging Face Transformers, spaCy, FAISS
• Hands-on experience running and tuning small language models (≤7B params) — spinning them up, stress-testing, optimising for latency and throughput
• Familiarity with quantisation (GGUF, ONNX, bitsandbytes) or model distillation
Nice to have
• RAG pipeline experience
Відгукнутись на вакансію |