Based in Barcelona, Qustodio is a fast-growing internet safety startup whose mission is to provide a safe digital experience for every child. Our top product is a multi-platform parental control solution that is used and loved by hundreds of thousands of families worldwide, and is one of the leading brands worldwide in the Digital Wellbeing category.
Qustodio is growing and we have some amazing challenges to face. From our Data Science Team, we design and implement data driven algorithms to be used by Qustodio main products. We have large scale automatic content classification systems which require batch and real time data processing workflows.
We are looking to incorporate to our team an experienced Data Engineer motivated and passionate of solving cutting-edge data processing challenges.
What you will be doing with us:
Design and implement data pipelines for business and technical requirements
Data collection, clean and transform from different sources
Make sure data infrastructure is reliable, scalable and efficient
Implement data serving layers for our business services
What we look for:
BSc/MSc academic background or equivalent experience
3+ years of experience in data engineering building and operating large scale data pipelines
Good knowledge of software development lifecycle including CI/CD, GitHub, JIRA and Confluence
Advanced coding skills preferably in Python
Broad knowledge of database technologies: relational (MySQL, PostgreSQL, Redshift, …) and non-relational (DynamoDB, MongoDB, ElasticSearch, ...)
Experience using Hadoop ecosystem, specially the one provided by AWS
Experience using and implementing data flows in data and ML processing frameworks: Spark, Hive
Experience using pipeline orchestration technologies like Airflow
Knowledge of systems engineering to optimize performance of data services
AWS technologies for data processing: Glue, EMR…
Knowledge of data streaming systems: Kafka, Amazon Kinesis...
Bonus points for:
Experience using Machine Learning toolkits. Deep learning frameworks like PyTorch, TensorFlow would be a big plus
Experience with containers deployment on Kubernetes and its ecosystem (Grafana, Prometheus, …)
What can we offer you: