Roles and Responsibilities:
Building end-to-end data pipelines for ML models, other data driven solutions such that the pipeline is directly usable for deployment/implementation.
Building and maintaining data pipelines: data cleaning, transformation, roll-up, pre-processing, etc.
Building/developing data insight solutions various teams like Credit, Collections, Distribution, Vigilance, HR, etc.
Building automation solutions using Python, SQL, Docker, etc. as required.
Technical Skills
Must have
Primary skill set:
High Proficiency in Python coding along with good knowledge of SQL (joins, nested query, etc.)
Data analysis experience. Understand and identify the data points and data acquisition mechanism for structured and unstructured data (text/json/xml) for machine learning data pipeline.
Knowledge of using Python Libraries such as Pandas, sqlalchemy (or other Python SQL related libraries), [good to have: matplotlib, numpy, scipy, scikit-learn, nltk].
Working knowledge of GIT repositories (any of the Github, Gitlab etc.)
Data management skill sets:
Ability to understand data models and create the ETL jobs using Python scripts.
Automate regular data acquisition, application process etc. using Python scripts.
Good to have (must be open to learning if doesn’t have already)
Web API technology:
Experience in Rest API developments using any of Django, Flask, FastAPI etc. (this will be highly appreciated)
Deployment of web API on cloud using Docker (this will be highly appreciated).
Desired Profile:
Working knowledge of Linux. (this will be highly appreciated).
Should be able to work on problems independently or with less support
Concept of bigdata and Spark (PySpark) knowledge
Cloud experience (AWS/Azure/GCP)
Masters’ degree in technical/quantitative fields (MCA, Computer Science, Engineer+MBA with coding expertise, Statistics, Maths, Economics, Physics, etc. undergrad with a relevant Master’s degree)
3+ years of experience (using Python and SQL)