Storage Engineer
TensorWave
Location
Las Vegas, Nevada
Employment Type
Full time
Location Type
On-site
Department
Engineering
Our mission at TensorWave Cloud is to build seamless, secure, reliable, and resilient AI infrastructure at scale, eliminating barriers and challenging the status quo to empower builders and support AI innovation.
About the role
We are looking for a Storage Engineer with deep expertise in NFS-based storage and modern high-performance file systems, specifically VAST Data and WEKA. This role exists to ensure our shared storage platforms are fast, reliable, scalable, and boring — even under extreme load.
You will own the design, operation, and performance of our file storage layer, supporting workloads that depend on low latency, high throughput, and predictable behavior. This is a hands-on role for someone who understands storage at the protocol and system level, not just from a dashboard.
If you think in terms of NFS semantics, metadata performance, failure domains, and throughput per node, this role is for you.
Responsibilities
Design, deploy, and operate NFS-based storage systems for production workloads
Own and operate VAST Data and WEKA clusters in production environments
Architect storage for high-throughput, low-latency shared file access
Tune and optimize NFS performance (mount options, client behavior, server-side tuning)
Manage capacity planning, scaling, and rebalancing for VAST and WEKA systems
Diagnose and resolve storage performance issues (latency spikes, metadata bottlenecks, throughput drops)
Design and test failure and recovery scenarios (node failures, network issues, disk loss)
Lead upgrades, expansions, and maintenance with minimal or zero downtime
Partner with infrastructure and application teams to ensure workloads are well-matched to storage behavior
Document operational runbooks and establish best practices for shared file storage
You Are Obsessed With:
NFS that behaves predictably under load
Consistent latency and throughput at scale
Understanding exactly how storage fails — before it does
File systems that scale without becoming fragile
Making shared storage invisible to users because it just works
Required Experience
Strong hands-on experience with NFS in production environments
Direct experience operating VAST Data and/or WEKA systems
Deep understanding of distributed file systems and shared storage architectures
Strong knowledge of storage performance fundamentals (latency, throughput, metadata operations)
Experience troubleshooting complex storage and networking interactions
Solid Linux systems knowledge, especially around filesystem and I/O behavior
Ability to reason about failure domains, recovery paths, and data integrity
Preferred Experience
Experience supporting AI/ML, HPC, or data-intensive workloads
Familiarity with RDMA, high-speed networking, or NVMe-based storage
Kubernetes workloads backed by shared file system
Experience with multi-rack or multi-site storage deployments
Infrastructure-as-Code experience or automation experience
What We Bring
Mission driven company
Competitive Salary
Stock Options
100% paid Medical, Dental, and Vision insurance
Life and Voluntary Supplemental Insurance
Short Term Disability Insurance
Flexible Spending Account
401(k)
Flexible PTO
Paid Holidays
Parental Leave
Mental Health Benefits through Spring Health
We’re looking for resilient, adaptable people to join our team, people who believe in the mission and think at massive scale. The solutions that worked on a handful of devices will not work at Exascale. Be prepared to be pushed daily, to learn a lot, and literally build the future.
TensorWave is an equal opportunity employer, committed to fostering an inclusive and supportive workplace. All qualified applicants and candidates will receive consideration for employment without regard to race, color, religion, sex, disability, age, national origin, or veteran status.