This is Anuvrat here! I am currently working as a Principal Software Engineer at Amazon Web Services (AWS) by role, but I am a problem solver at heart, operating at the intersection of business and technology. I have mostly been involved in measurement, experimentation and analytic domains in my Amazon career. The last few years got me interested in the ethics of data processing, starting with the advertising industry, since that’s what I’m closest to. I know I’ve only dipped my toes in this space and there’s a lot more to learn.
Outside of work, I love to hike. Growing up on the Deccan Plateau, I never knew what it meant to live near the mountains. I imagined mountains to only exist in the frigid north and be a very unwelcoming territory. Since having moved to the Pacific North West, I cannot get enough of them. I have discovered peace and joy.
Finally, I have picked up chess again, after more than 20 years. There’s so much to learn!
Writing Into Dynamic Partitions Using Spark
Hive has this wonderful feature of partitioning — a way of dividing a table into related parts based on the values of certain columns. Using partitions, it’s easy to query a portion of data. Hive optimizes the data load operations based on the partitions. Writing data into partitions is very easy. You have two options:…
Parse Json in Hive Using Hive JSON Serde
In an earlier post I wrote a custom UDF to read JSON into my table. Since then, I have also learnt about and used the Hive-JSON-Serde. I will use the same example as before. Now, using the Hive-JSON-Serde you can parse the above JSON record as: This is really great! I can now parse more…
SPOJ | NICEDAY — The Day of the Competitors
Problem Contestants are evaluated in 3 competitions. We say that: A contestant A is better than B if A is ranked above B in all of the three competitions, they were evaluated in. A is an excellent contestant if no other contestant is better than A. Given the ranks of all the contestants that participated…
Writing UDF To Parse JSON In Hive
Sometimes we need to perform data transformation in ways too complicated for SQL (even with the Custom UDF’s provided by hive). Let’s take JSON manipulation as an example. JSON is widely used to store and transfer data. Hive comes with a built-in json_tuple() function that can extract values for multiple keys at once. But if…