Archive for the 'Coding' Category

Your ClassNotFound Error Is Probably Not Telling You Everything

Quicktip
If a class fails to load due to an exception during class initialization, the actual problem is only logged the first time you attempt to load the class.  After the first time, the classloader recognizes that it’s already tried to load the class and just throws a ClassNotFoundException or NoClassDefFoundError.

Symptom
You will see logs for the ClassNotFoundException or NoClassDefFoundError (usually many such logs), but you see that the class is in the classpath.  You won’t see any root cause on any stack trace but the first (and that one is typically not a CNFE or NCDFE).

Finding the root cause
If you get a CNFE or NCDFE and you see the class in the classpath, search back your logs for ${missing classname}.<clinit> in a stack trace to figure out what prevented the class from being loaded.  Remember that the log may have rolled off.

Possible root causes for CNFE/NCDFE

  • Desired class is not in classpath (this is the boring case)
  • Initialization of desired class throws a RuntimeException or an Error
    • Static variable is initialized via a function which threw an uncaught Throwable
      • E.g. public static final String SOME_CONST = SomeClass.getString(“SOME_KEY”); where SomeClass.getString(String key) can throw an Exception.
    • Static block (loose code in {} in the class definition) threw an uncaught Throwable
      • E.g. public class Foo { { doSomeStaticInitialization(); } …
    • variable or method signature includes type which could not be initialized (see this same list of root causes)
      • E.g. public class Foo { SomeClassWithErrorInInitializer attr1; …

From The SDE Tip – Amazon

Popularity: 1% [?]

SQL Joins

SQL Joins

Popularity: 2% [?]

SQL: Load Data Infile With Column Value Modification

My requirement was to load data to a table from a tab delimited file while adding an extra column. Here’s how I did it:

In fact, we can do more. We can assign a value to  a variable and apply some transform to it.

Popularity: 1% [?]

WTF Code: Hardcode URL When Cleaning

At Amazon it’s a Service Oriented Architecture that everyone follows. We use the PAAPI (Product Advertising API) to create links to products in various marketplaces. Both PAAPI and us use the tag parameter to identify user who sent the request. So we always clean the urls returned by PAAPI to strip off tag. 

I looked around the code and found that we have a method which takes in a list of parameters to be removed and cleanses the url. Good. I had to integrate www.javari.jp into our widget server. When testing I found that all the urls were defaulting to www.amazon.co.jp instead of javari! Now this was baffling because clearly the url cleaner did not such thing as change the domain. I went deeper into the code and what I found surprised me!

For some reason, unknown to anyone in the team, someone had decided to comment out the piece of code calling that url cleaner method and instead chose to write his own cleaner. This is what he did:

As I understand from the above code, all that he wanted to do was strip out the get parameters. But that can very easily be done by using the URL class of java. Given a valid url, URL has all the logic to parse it and spit out various segments. My modified code was:

Simple and easy as it can get!

Popularity: 1% [?]

Shell: Get Partition With Max Disk Space

I wanted to find the partition that has the maximum disk space and it’s mount point. This is the script that Anuj and I (mostly Anuj) came up with.

Explanations:

  • df gives us the disk space.
  • The first awk sorts all the lines numerically based on the 2nd column, starting from the second line.
  • The second awk prints the 6th column which gives the mounted partition information.
  • Finally the combination of head and tail gives us the top result.

Popularity: 1% [?]

Hive UDF: Cannot Run Program – No Such File Or Directory

We are using Amazon EMR to run hive. I wrote up a perl script to carry out certain transformations. This script is stored in the s3. The script has executable permission for all users. However, when I use the script I get an error saying the program could not be run as no such file or directory was found!

I confirmed that hadoop did download the file and has all permissions set.

Baffled, I googled around to see if people have had this issue. And I found a match in one of the AWS Developer Forums - https://forums.aws.amazon.com/message.jspa?messageID=126905.

To quote:

Hadoop fetches your file from S3 and puts it in the distributed cache before starting the job.During this processs Hadoop flips the executable bit of the file off and thefile is no longer an executable in the distributed cache. The error message isa bit misleading, but you should be able to get it to work if you explicitlyinvoked PHP.

Taking the hint, I modified my hive ql to

And things now work like a charm.

Popularity: 1% [?]

Hive: User Defined Functions

Writing user defined functions in perl for hive are so easy! Say you have a table in hive that has tab separated columns. In your script you will expect to receive the same tab separated columns from STDIN, each line separated with \n. You can do whatever processing you want to. Then just print back everything to STDOUT in the same format – columns separated by tab and lines by \n.

For example, consider you have a table with tab separated columns full_name, address, email_id. You want to write a function to parse the full_name and extract out the first and the last name. Your script will be:

And you can call this script from hive like this:

In fact, you could form the below query to select all the people whose last name is ‘Singh’.

Popularity: 2% [?]

Recursively Remove .svn Folders

I was moving my svn projects to git repository. I needed to get rid of all the .svn folders recursively. Following is the script that I used:

Popularity: 1% [?]

Groovy : Parse All Soccer Players Info

I am new to groovy and am still getting used to the scripting way of thing coming from Java. So as a learning exercise I wrote up the following lines to parse information of all the soccer players from ESPN Soccernet. I have used the Jsoup library to get the document and parse it.

def leagues = [
        "http://soccernet.espn.go.com/clubs/_/league/eng.1/english-premier-league?cc=4716",
        "http://soccernet.espn.go.com/clubs/_/league/esp.1/spanish-la-liga?cc=4716",
        "http://soccernet.espn.go.com/clubs/_/league/ita.1/italian-serie-a?cc=4716",
        "http://soccernet.espn.go.com/clubs/_/league/ger.1/german-bundesliga?cc=4716",
        "http://soccernet.espn.go.com/clubs/_/league/fra.1/french-ligue-1?cc=4716",
]

leagues.each {leagueUrl ->
    Utils.getDocument(leagueUrl).select("table[class=tablehead]").get(0).select("td:eq(2)").select("a[href]").each {teamStatsUrl ->
        Utils.getDocument(teamStatsUrl.attr("abs:href")).select("tbody").each {playerGroup ->
            playerGroup.select("td:eq(1)").select("a[href]").each {playerLink ->
                Element playerProfile = Utils.getDocument(playerLink.attr("abs:href")).select("div.profile").get(0)
                String playerName = playerProfile.select("h1").text()

                def profilePrperties = [:]
                playerProfile.select("li").each {item ->
                    String[] itemProperties = item.text().split(":")
                    if(itemProperties.size() == 1) profilePrperties.get("teams", []).add(itemProperties[0])
                    else profilePrperties[itemProperties[0]] = itemProperties[1]
                }
                println playerName + " " + profilePrperties
            }
        }
    }
}

All that Utils.getDocument(url) here does is to call Jsup.connect(url).get() within a loop with number of retries set to 5. The script produces output as follows:

Ramires [Full Name: Ramires, Squad No: 7, Position: Midfielder, Age: 24, Birth Date: Mar 24, 1987, Birth Place: Barra do Piraí, Rio de Janeiro, Brazil, Height: 5' 11'' (1.80m), Weight: 73 kg, teams:[Brazil, Chelsea]]
Frank Lampard [Squad No: 8, Position: Midfielder, Age: 33, Birth Date: Jun 21, 1978, Birth Place: Romford, Height: 6' 0" (1.83m), Weight: 174 lbs (78.7 kg), teams:[England, Chelsea]]
Fernando Torres [Full Name: Fernando Torres, Squad No: 9, Position: Forward, Age: 27, Birth Date: Mar 20, 1984, Birth Place: Fuenlabrada, Madrid, Height: 6' 1'' (1.85m), Weight: 174 lbs (78.7 kg), teams:[Spain, Chelsea]]
John Mikel Obi [Squad No: 12, Position: Midfielder, Age: 24, Birth Date: Apr 22, 1987, Birth Place: Jos, Nigeria, Height: 5' 11'' (1.80m), Weight: 179 lbs (81.3 kg), teams:[Chelsea]]
Raul Meireles [Full Name: Raul Meireles, Squad No: 16, Position: Midfielder, Age: 28, Birth Date: Mar 17, 1983, Birth Place: Porto, Portugal, Height: 1.79m, Weight: 65 kg, teams:[Chelsea, Liverpool, Portugal]]
Branislav Ivanovic [Squad No: 2, Position: Defender, Age: 27, Birth Date: Feb 22, 1984, Birth Place: Sremska Mitrovica, Yugoslavia, Height: 6' 2" (1.88m), Weight: 86 kg, teams:[Serbia, Chelsea]]
Juan Mata [Full Name: Juan Mata, Squad No: 10, Position: Forward, Age: 23, Birth Date: Apr 28, 1988, Birth Place: Burgos, Spain, Height: 1.70m, Weight: 61 kg, teams:[Spain, Valencia, Chelsea, Spain U21]]

Popularity: 2% [?]

Java ArrayList Is Actually Just An Array !

I was reading The Art of Computer Programming Vol 1, the chapter about arrays and list. Knuth describes the differences and mentions how it’s very easy to add a new element to a list in a constant time if we have the node where the insertion needs to happen. Same for deletion. That is when I realized I did not know how the ArrayList implementation handles it. I did a code walk through of that class – something Prasun had advised me long ago but I always ignored.

I always knew that ArrayList used array for implementation, which is how it can guarantee the O(1) element access time. But I did not know how the other methods worked. For instance, one need not provide any initial size of the ArrayList. How does Java handle it? I remember Prasun telling me that the initial size in such cases is 10. But then what happens in the case of an overflow? What is the new space that gets assigned to the list?

But I was more interested in the add(index, element) and the remove(index) methods. And they are not constant time! They are O(n) operations. Say we want to remove(6). Java actually copies over all the elements from 7..n to indexes 6..n-1 and then sets the nth element as null. This is so not what we learn as deletion in list where it was just a matter of changing a couple of pointers (the next and prev ones).

Popularity: 2% [?]