Expert-level proficiency in PySpark and python knowledge
Strong understanding and experience in distributed computing frameworks, particularly Apache Hadoop (YARN, MR, HDFS) and associated technologies - one or more of Hive, Sqoop, Avro, Flume, Oozie, Zookeeper, Impala, etc
Hands-on experience with Apache Spark and its components (Streaming, SQL, MLLib) is a strong advantage.
Operating knowledge of cloud computing platforms (AWS/Azure/GCP)
Experience working within a Linux computing environment, and use of command line tools including knowledge of Shell/Python scripting for automating common tasks
Must be proficient in any Cloud computing platforms (AWS/Azure/GCP)
Experience in GCP (Big Query/Bigtable, Pub sub, Data Flow, App engine )/ AWS, Azure would be preferred
What You'll do:
The role would involve big data pre-processing & reporting workflows including collecting, parsing, managing, analyzing and visualizing large sets of data to turn information into business insights
Develop the software and systems needed for end-to-end execution on large projects
Work across all phases of SDLC, and use Software Engineering principles to build scalable solutions
Build the knowledge base required to deliver increasingly complex technology projects
You would be responsible for evaluating, developing, maintaining and testing big data solutions for advanced analytics projects
The role would also involve testing various machine learning models on Big Data, and deploying learned models for ongoing scoring and prediction.
Option to 'work from home'
About Propellor.ai(ThinkBumblebee Analytics)
Propellor.ai helps businesses do better business. In a world of data surplus and optimisation deficit, we make the lives of business owners simpler with our AI data tools that turn data into effective insights for each member of the company.
The changing face of business requires a dynamic toolkit which integrates data sources (Bond), visualises the data (Bolt), understands the user base (Box), enhances campaign conversions (Beam) and identifies customer churn (Bynd).