Senior Data Engineer
Tools for Humanity (Worldcoin LA)
About the Company:
Worldcoin (www.worldcoin.org) is an open-source protocol, supported by a global community of developers, individuals, economists and technologists committed to expanding participation in, and access to, the global economy. Its community is united around core beliefs in the inherent worth and equality of every individual, the right to personal privacy, and open and public collaboration. These beliefs are reflected in what the community is building: a public utility to connect everyone to the global economy.
The Worldcoin Foundation (www.worldcoin.foundation) is the protocol’s steward and will support and grow the Worldcoin community until it becomes self-sufficient. Tools for Humanity (www.toolsforhumanity.com) is a global hardware and software development company. It helped launch Worldcoin and continues to provide support to the Foundation, in addition to operating the World App.
This opportunity would be with Tools for Humanity.
About the AI & Biometrics Team:
The AI & Biometrics team is building a biometric iris recognition system that can work reliably with more than a billion users and enables them to claim their free share of WLD. We use cutting-edge machine learning deployed on custom hardware to enable high-quality image acquisition, identification, and fraud prevention, all while requiring minimal user interaction. Our technology, coupled with privacy-preserving data collection, allows us to increase system performance and reduce model bias.
We are building an iris recognition and fraud detection engine that works on the 1bn people scale. Therefore, its performance needs to out-perform all the current iris recognition technologies. We leverage our powerful custom-made iris recognition device, the Orb, combined with the latest research from the field of AI and Deep Learning
About the Opportunity:
We are seeking a Dataset Engineer who will play a pivotal role in the backbone of our machine learning work: the datasets. In this critical position, you'll be at the forefront of our MLOps framework, focusing on data ingestion, annotation, and dataset orchestration. You will have the opportunity to build and maintain the infrastructure that fuels our machine learning algorithms, ensuring that the data is accurate, accessible, and ready to use for model training.
In this role you will:
- Design, implement and maintain automations for data ingestion and human annotation request pipelines to ensure data availability to multiple ML workstreams
- Collaborate with our AI researchers and engineers to determine dataset requirements and create tooling to help create high-quality training and evaluation datasets to live
- Utilise tools like Docker, Kubernetes and Terraform to deploy, scale and manage the infrastructure supporting our dataset operations
- Work closely with other team members, including AI researchers, software engineers and product managers, to ensure alignment and smooth dataset delivery based on current needs
- 3 years of experience in the industry of Data Engineering, Software Engineering, Computer Vision or a related field
- Strong foundation in Python, including frameworks for data and image manipulation such as Pandas and OpenCV
- Strong experience with Docker, Kubernetes, and Terraform
- Proficiency in working with MongoDB and AWS services
- Familiarity with continuous integration, preferably with GitHub CI
- Solid understanding of data pipelines, ETL processes, statistical analysis and data quality best practices
- Previous work in the area of Computer Vision is a nice to have