Freelancing with Data

Freelancing with Data

Documenting everyday problems and how to find solution.

For the first time in my career, I am freelancing for a small startup. Documenting this journey as I go along.

Its a small company trying to create a community of gamers and game developers and make a fortune by increasing game engagement.

Where do I fit in?

Like every other company they are struggling with ever increasing cloud costs. Mismatching numbers. broken analytics?

I am here to fix their problem. - How do I get my numbers?

Basically the tech stack is completely hosted on AWS cloud. Until now, they were happily doing with a combination of lambda functions, step functions and glue crawlers on top of it to build their data lake on Athena.

Now with increasing data, these lambda functions are having a throughput failure, it does not run beyond 15 minutes.

The data models not being defined properly in some cases is causing the Athena queries to run for longer. And the data scientist are struggling to schedule some complex transformation jobs in their system.

The first step I am doing is setting up a self managed airflow on an ec2 host. This will help me build their pipelines and have a better hand and managing their pipelines.

Is this the best way to go about? Only time will tell. Until then keep reading Just data problems

Thanks!

__thatpyjamagirl \m/