January 28, 2010

Parsing Excel Sheet made easy with Annotations and POI

HSSF - Horrible Spreadsheet Format – not anymore. With few annotations, excel parsing can be done in one line.

We had a requirement in our current project to parse multiple excel sheets and store the information to database. I hope most of the projects involving excel sheet parsing would be doing the same. We built a extensible framework to parse multiple sheets and populate JAVA objects with annotations.

We will discuss the steps to use annotations to parse excel sheet and populate Java objects.

Consider we have an excel sheet with student information.
Parsing Logic:
While parsing this excel sheet, we need to populate one “Section” object and multiple “Student” objects related to a Section. You can see that Student information is available in multiple rows whereas the Section details (Year, Section) is available in column B.

We will have to annotate the above information to the domain class, that can be interpretted by the sheet parser.

1) Annotate Domain Class:
First we will see the steps to annotate Section object:


@ExcelObject(parseType = ParseType.COLUMN, start = 2, end = 2)
public class Section {
 @ExcelField(position = 2)
 private String year;

 @ExcelField(position = 3)
 private String section;
 
 @MappedExcelObject
 private List <Student> students;
}
You can find three different annotation in this class.
i)ExcelObject: This annotation tells the parser about the parse type (Row or Column), number of objects to create (start, end). Based on the above annotation, Section value should be parsed Columnwise and information can be found in Column 2 (“B”) of the Excelsheet.

ii)ExcelField: This annotation tells the parser to fetch “year” information from Row 2 and “section” information from Row 3.

iii)MappedExcelObject: Apart from Simple datatypes like “Double”,”String”, we might also try to populate complex java objects while parsing. In this case, each section has a list of student information to be parsed from excel sheet. This annotation will help the parser in identifying such fields.

Next step is to annotate Student class:


@ExcelObject(parseType = ParseType.ROW, start = 6, end = 8)
public class Student {
 @ExcelField(position = 2)
 private Long roleNumber;
 
 @ExcelField(position = 3)
 private String name;
 
 @ExcelField(position = 4)
 private Date dateOfBirth;
 
 @ExcelField(position = 5)
 private String fatherName;
 
 @ExcelField(position = 6)
 private String motherName;
 
 @ExcelField(position = 7)
 private String address;
 
 @ExcelField(position = 8)
 private Double totalScore;
}
i) ExcelObject: As shown above, this annotation tells parser to parse Rows 6 to 8 (create 3 student objects). NOTE: Optional field “zeroIfNull” , if set to true, will populate Zero to all number fields (Double,Long,Integer) by default if the data is not available in DB.

ii) ExcelField: Student class has 7 values to be parsed and stored in the database. This is denoted in the domain class as annotation.

iii) MappedExcelObject: Student class does not have any complex object, hence this annoation is not used in this domain class.

Once the annotation is done, you have just invoke the parser with the Sheet and the Root class you want to populate.

//Get the sheet using POI API.
String sheetName = "Sheet1";
SheetParser parser = new SheetParser();
InputStream inputStream = getClass().getClassLoader().getResourceAsStream("Student Profile.xls");
Sheet sheet = new HSSFWorkbook(inputStream).getSheet(sheetName);

//Invoke the Sheet parser.
List
entityList = parser.createEntity(sheet, sheetName, Section.class);

Thats all it requires. Parser would populate all the fields based on the annotation for you.

You can find the source code in Github

January 07, 2010

Crack PDF password - Java Brute force algorithm

Recently I was in a big mess when I lost the password for an important PDF document. I ended up using the Brute force algorithm and cracked the PDF password.

PDFBox API is being used in the below code to open the PDF file. Please download the dependent JAR files from here.


Word of caution: Based on the length of the password, it might even take few days to run it.


package com.bruteforce;

import java.io.IOException;

import org.apache.pdfbox.exceptions.CryptographyException;
import org.apache.pdfbox.exceptions.InvalidPasswordException;
import org.apache.pdfbox.pdmodel.PDDocument;

public class BruteForceDecryptPDF {

 private static final String PDF_FILE_PATH = "c:/file2.pdf";
 private static final int PASSWORD_MAX_LENGTH = 4;
 private static final int PASSWORD_MIN_LENGTH = 2;
 // Array with characters to use in Brute Force Algorithm.
 // You can remove or add more characters in this array.
 private static char fCharList[] = { 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h',
   'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u',
   'v', 'w', 'x', 'y', 'z','0','1','2','3','4','5','6','7','8',
   '9'};

 public static void main(String[] args) {
  long startTime=System.currentTimeMillis();
  PDDocument document = null;
  boolean found = false;
  try {
   document = PDDocument.load(PDF_FILE_PATH);

   for (int i = PASSWORD_MIN_LENGTH; i <= PASSWORD_MAX_LENGTH; i++) { 
    try {
     startBruteForce(i, document);
    }catch(PasswordFoundInterrupter ex){
     found=true;
     System.out.println("SUCCESS. PASSWORD: - " + ex.getMessage());
    }
    catch (Exception e) {
     e.printStackTrace();
    }
   }
   if(!found){
    System.out.println("Failure - Password not found");
   }
   document.close();
   long timeTaken=System.currentTimeMillis()-startTime;
   System.out.println("TOTAL TIME TAKEN - " + timeTaken);
  } catch (IOException e) {
   System.out.println("PDF File not found at - "+ PDF_FILE_PATH);
   throw new RuntimeException(e);
  }
 }

 public static void startBruteForce(int length, PDDocument document) {
  StringBuffer sb = new StringBuffer(length);
  char currentChar = fCharList[0];

  for (int i = 1; i <= length; i++) {
   sb.append(currentChar);
  }

  changeCharacters(0, sb, length, document);
 }

 private static StringBuffer changeCharacters(int pos, StringBuffer sb,
   int length, PDDocument document) {
  for (int i = 0; i <= fCharList.length - 1; i++) {
   sb.setCharAt(pos, fCharList[i]);
   if (pos == length - 1) {    
    // Uncomment this to see the generated passwords. It may slow down if you are trying for larger size passwords.
    //System.out.println(sb.toString());
    if(valid(sb.toString(),document)){
     throw new BruteForceDecryptPDF.PasswordFoundInterrupter(sb.toString());
    }
   } else {
    changeCharacters(pos + 1, sb, length, document);
   }
  }

  return sb;
 }
 
 public static boolean valid(String password, PDDocument document) {
  System.out.println(password);
  try {
   document.decrypt(password);
   System.out.println("Success - " + password);
   return true;
  } catch (IOException e) {
   throw new RuntimeException(e);
  } catch (CryptographyException e) {
   return false;
  } catch (InvalidPasswordException e) {
   return false;
  }
 }
 
 private static class PasswordFoundInterrupter extends RuntimeException{
  public PasswordFoundInterrupter(String message) {
   super(message);
  }  
 }
}