-
Always Specify Region When Calling DynamoDb from Hive
DynamoDb is a key-value storage store. One can query DynamoDb tables from Hive using the DynamoDBStorageHandler. It’s super easy to setup. Let’s say we have built a platform that collects data for various clients, processes the data and outputs the processed data per client. For our example, let’s say each client can be identified by…
-
CamelCase Partition Column is a Bad Idea in Hive
Outside Java code I prefer snake_case over camelCase. This is mostly a preference without any strong good reason: Without a proper IDE I find it easier to read snake_case words than camelCase words. Python’s naming convention uses snake_case for variable names. They use camelCase only for class names. Languages like MySQL, Hive, etc convert everything…
-
Reusing Hive Scripts
Amazon’s Elastic Data Pipeline does a fine job of scheduling data processing activities. It spawns a cluster and executes Hive script when the data becomes available. And after all the jobs have completed the pipeline shuts down the EMR resource and exits. Since the cluster is only created and in use while the scripts are…
-
Scrapy | Crawl WhoScored For Football Stats
Earlier, I have written code to crawl Google Play, iTunes AppStore and Goal.com websites. But every time I re-wrote the code to get content from website, parse it using BeautifulSoup while maintaining the list of crawled URLs to avoid crawling them again. This was a lot of work. A while ago I, discovered Scrapy. It’s…
-
Crawl iTunes AppStore To Get List of All Apps
Git Repo: https://github.com/anuvrat/scrape-google-play/ Thankfully Apple AppStore provides a nice index to look up all the apps. All the apps have been categorized into 23 broad categories. Within each category the apps have been indexed alphabetically. So, to discover all the apps in the iTunes AppStore, one only needs to crawl the main index page, find all the category…
-
Crawl Google Play to Get List of All Apps
Github Repo: https://github.com/anuvrat/scrape-google-play Unlike Apple, Google does not provide a list of all the apps in the Google Play store. There’s no index which has links to all active apps in their marketplace (Apple has a nice alphabetically index list per category of all apps in the iTunes AppStore). The only way to discover apps in…
-
LASIK
I have finally decided to go through with the LASIK surgery. I have actually convinced my parents to prepone the surgery to this December rather than wait till the summer break. Hopefully, if I get placed earlier, I will be able to return home in time and be free of the spectacles. LASIK is a…