Robotics AI Engineer (RL, Vision Language Action)

INTRODUCTION

You will design and train next‑generation Vision‑Language‑Action (VLA) models for humanoid robots that understand natural language instructions, perceive complex scenes, and act safely in real industrial environments. You will focus on learning from limited teleoperation data and reducing the gap between real demos and synthetic simulation (e.g., Isaac/Omniverse, NVIDIA Cosmos) to build generalizable, safety‑aware policies for humanoids in factories and logistics. You will work closely with Teleoperation, RL & Controls, Simulation, and Platform teams to bring VLA models from research into production robots.

JOB DESCRIPTION

Design and implement VLA architectures that take RGB/Depth, language, robot state, and task history to generate actions (pose targets, motion primitives, or low‑level controls).
Develop training strategies for low‑data, noisy teleoperation datasets, including strong augmentations, self‑/semi‑supervised pretraining, multi‑task learning, and behavior cloning / offline RL hybrids.
Tackle distribution shift between real demos and simulation, using domain randomization, sim‑parameter sampling, and joint training on real and synthetic data.
Build synthetic data pipelines in high‑fidelity humanoid simulators (Isaac/Omniverse) and integrate world‑model–based data generation (e.g., NVIDIA Cosmos) to create diverse scenarios and edge‑case curricula.
Define metrics and test suites (task success, safety, instruction following, sim‑to‑real gap), run ablations, and collaborate with RL/Controls teams to deploy VLA policies on real robots.

REQUIREMENTS

Strong background in deep learning for sequence / multimodal modeling (transformers, diffusion models, recurrent or latent world‑model architectures).
Hands‑on experience building and training vision‑language or VLA‑style models (e.g., VLMs, embodied LLMs, language‑conditioned policy networks).
Solid understanding of at least one of:
- Imitation learning / behavior cloning
- Offline / batch RL
- Inverse RL or preference‑based learning
Experience with robot learning from demonstration or teleoperation data (any platform; humanoids is a plus).
Strong engineering skills in Python and modern ML frameworks (PyTorch preferred), including clean training loops, data pipelines, and experiment management (configs, logging, basic MLOps).
Bachelor’s/Master’s/Ph.D. in Computer Science, Robotics, Electrical Engineering, or related field; or equivalent industry experience.

Preferred skills

Experience with NVIDIA physical‑AI stacks: Isaac (Sim/Lab), Omniverse, or NVIDIA Cosmos world foundation models for synthetic data generation and sim‑to‑real workflows.
Prior work on autonomous robots (control, perception, or policy learning) or complex articulated robots in industrial settings.
Contributions to embodied AI / robot learning (papers, open‑source projects, widely used codebases).
Familiarity with safety‑critical robotics: safe action constraints, human‑in‑the‑loop supervision, and fallback mechanisms.
Experience deploying models on GPU clusters or edge devices (latency/memory profiling, batching, mixed precision).

BENEFITS

Salary: upto 2000$ per month, negotiation based on candidate’s capacity.
Lunch allowance, holiday bonuses, 13th-month salary.
Birthday, sick leave, maternity, and welfare policies according to Vietnam Labor Code.
Overtime pay in accordance with legal regulations (150–300% rate depending on time).
Full social insurance contribution; private health insurance for employees and managers.
Opportunity to work on cutting‑edge humanoid robotics and see your VLA/RL work deployed in real industrial environments.
Collaborative, fast‑paced environment with cross‑functional teams in robotics, simulation, and AI.

CONTACT

Email: tuyendung@phenikaa-x.com
Contact: 0971131001 (Ms. Thuy Duong)

Robotics AI Engineer (RL, Vision Language Action)

Apply for this position