Monday, October 31, 2011

Bin2Txt - Remove Non-printable Characters from a Text File with Java

I recently was dealing with some DB2 "unload" (e.g., export) files that I wanted to parse and then load into Oracle. I found that the unload files use a lot of binary characters, which makes it very difficult to parse. I wrote the following Java class to convert the unprintable characters into a tilde (which is a character that does not occur in the data). This resulted in DB2 unload files that were parsable as fixed-width data files.
The main problem this approach does not attempt to solve is that the DB2 unload files save numeric fields as the actual value, not the digit equivalent (i.e., the number 84 is unloaded as the ASCII-equivalent "T", not "84"). This code obviously does not reference the DB2 "punch" (e.g., parse instruction) files, so it makes no attempt to parse the files into fields itself - that is a separate exercise in my case. BTW, if there is a good way to import these files into Oracle automatically, please let me know, as I have not been able to find a better solution.
This code is fairly generic, and can be used for other purposes beyond converting DB2 unload files, so if you have a need to replace non-printable characters in text files, you can start with this code base.

package com.threeleaf.bin2txt;

import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;

/**
 * Purpose is to read a file and replace non-printable characters with a given character.
 * Specifically, I want to use this to make DB2 unload files parsable with other applications so
 * that the data can be imported into Oracle.
 *
 * @author John A. Marsh
 * @since 2011-10-27
 */
public final class Bin2Txt {

    /**
     * Run this class from the command line with:
     * java Bin2Txt <pathAndFilename>.
     *
     * @param args
     *        the filename to convert
     * @throws IOException
     *         Signals that an I/O exception (e.g., file not found) has occurred.
     */
    public static void main (final String[] args) throws IOException {
        final byte ASCII_SPACE = 32;
        final byte ASCII_CR = 13;
        final byte ASCII_LF = 10;
        final byte ASCII_TILDE = 126;

        try {
            final File file = new File(args[0]);
            final InputStream inputStream = new FileInputStream(file);
            final long fileLength = file.length();

            /*
             * Array needs to be created with an int type, so need to check to ensure that file is
             * not larger than Integer.MAX_VALUE.
             */
            if (fileLength > Integer.MAX_VALUE) {
                throw new IOException("File is too big");
            }

            /* Create the byte array to hold the data */
            final byte[] bytes = new byte[(int) fileLength];

            /* Read in the bytes */
            int offset = 0;
            int numRead = 0;
            while (offset < bytes.length && (numRead = inputStream.read(bytes, offset, bytes.length - offset)) >= 0) {
                offset += numRead;
            }

            /* Ensure all the bytes have been read in */
            if (offset < bytes.length) {
                throw new IOException("Could not completely read file " + file.getName());
            }
            inputStream.close();

            for (int i = 0; i < bytes.length; i++) {
                if (bytes[i] == ASCII_CR && bytes[i + 1] == ASCII_LF) {
                    /*
                     * Preserve line breaks (carriage return + line feed) by skipping over them.
                     * Note that I don't check for end of file here because I already know my
                     * particular files will never end with a CRLF.
                     */
                    i = i + 2;
                }
                if (bytes[i] < ASCII_SPACE || bytes[i] > ASCII_TILDE) {
                    /* Replace all non-printable characters. */
                    bytes[i] = ASCII_TILDE;
                }
            }
            /* Output file name will be the same as the input, with ".out.txt" added to the end. */
            final OutputStream outputStream = new FileOutputStream(args[0] + ".out.txt");
            outputStream.write(bytes);
            outputStream.close();
        } catch (final ArrayIndexOutOfBoundsException e) {
            /*
             * If no file was passed on the command line, this exception is generated. A message
             * indicating how to the class should be called is displayed.
             */
            System.out.println("Usage: java Bin2Txt filename\n");
        }
    }
}

Here is a batch file that will convert all the files in a given directory:

:: Directory where Bin2Txt.class is located ::
cd C:\projects\workspace\bin2txt\bin\
:: Put in directory where unload files are ::
for %%f in ("C:\projects\Database\Unloads\*.txt") do call java com.threeleaf.bin2txt.Bin2Txt %%f

Thursday, October 13, 2011

Solved: Firefox Tab Icons Are Missing

I have been frustrated over the last few weeks because I could not figure out why the icons suddenly stopped showing up in Firefox. I have scoured the web and tried several suggestions, but none of them worked for me. I finally got clued in that it might be one of my extensions, and sure enough, going to Help » Restart With Add-ons Disabled caused the icons to reappear. After a little trial and error I finally found that it was the Favicon Picker 2. I wanted the functionality, so I tried Favicon Picker 3, but it caused the same problem. I finally found that Bookmark Favicon Changer (https://addons.mozilla.org/en-US/firefox/addon/bookmark-favicon-changer/) allowed me to customize my bookmark icons without causing the tabs to go blank. So, if you find that the favicons in your tabs are absent, give it a try. As of this writing, I am using Firefox 7 and Bookmark Favicon Changer 1.54.

Tuesday, September 13, 2011

Interning String Literals In Java

A coworker and I were discussing strings, and some of the things we had heard about how the Java compiler stores and retrieves them from memory. We were particularly wondering if constants, local variables, literals, and runtime generated strings were handled differently. I wrote the following JUnit 4 test to show how all these cases are handled (Java JDK 1.6.0_27):

/**
* Test string interning affects on literal and runtime strings.
*/
public final class StringInternTest {

/**
* String intern test.
*/
@Test
public void stringIntern () {
/* literals */
assertEquals("date", "date");
assertTrue("date" == "date");
/* Compare to an external constant */
assertEquals("date", Constants.DATE);
assertTrue("date" == Constants.DATE);
/* Compare locally defined strings. */
final String myDateString = "date";
assertEquals("date", myDateString);
assertTrue("date" == myDateString);
final String myDateString2 = "date";
assertEquals(myDateString, myDateString2);
assertTrue(myDateString == myDateString2);
assertEquals(Constants.DATE, myDateString);
assertEquals(Constants.DATE, myDateString2);
assertTrue(Constants.DATE == myDateString);
assertTrue(Constants.DATE == myDateString2);
/* Create new strings at runtime. */
final String runtime1 = new String("date");
final String runtime2 = new String("date");
assertEquals("date", runtime1);
assertEquals("date", runtime2);
assertEquals(Constants.DATE, runtime1);
assertEquals(Constants.DATE, runtime2);
assertEquals(runtime1, runtime2);
assertFalse(runtime1 == runtime2); // !!!
assertFalse("date" == runtime1); // !!!
assertFalse("date" == runtime2); // !!!
assertFalse(Constants.DATE == runtime1); // !!!
assertFalse(Constants.DATE == runtime2); // !!!
/* Intern the runtime strings. */
final String interned1 = runtime1.intern();
final String interned2 = runtime2.intern();
assertTrue(interned1 == interned2);
assertTrue("date" == interned1);
assertTrue("date" == interned2);
assertTrue(Constants.DATE == interned1);
assertTrue(Constants.DATE == interned2);
}
}

If you set a break point in this test and examine the variables you will find that Constants.DATE, myDateString, myDateString2, interned1, and interned2 all have the same internal object ID. I learned online that all string literals are supposed to be interned when the application is compiled, which accounts for the variables having the same ID.

Strings stored in runtime1 and runtime2 are each given a unique object ID when instantiated at runtime, and thus return a false when == is tried with any of the interned strings.

Some have suggested that manually interning is better because using == is much faster (5x) than equals() (because comparing object IDs is faster than comparing string lengths || characters). However, others dismiss this as a minor gain at best, and I can certainly see that using == might lead to some hard-to-find bugs if a non-interned string is compared to either an interned or another non-interned string. I also learned that main argument strings (args[]) are not interned.

Interned strings are stored in the PermGen (Permanent Generation) memory, and it is possible to fill up that space with strings if one is not careful.

References:

Thursday, September 08, 2011

SOLUTION: "SQL driver not found org.apache.derby.jdbc.ClientDriver" When Using Oracle with Sonar and Maven

I am new to Sonar and Maven, and have been perplexed for several hours with the "SQL driver not found org.apache.derby.jdbc.ClientDriver" error I was getting when running mvn sonar:sonar from the Windows command line after starting the Sonar server.

What is not clear in the instructions is that you must modify both <sonar path>\conf\sonar.properties and <maven path>\conf\settings.xml to get this to work. The Sonar properties tell the Sonar server what database driver to use in the Sonar server, and the Maven settings file tells Maven what database driver to use when it is running Sonar.

Here are the changes I made to connect to my local Oracle XE instance:

sonar.properties

...

sonar.jdbc.username: sonar
sonar.jdbc.password: sonar

...

# Comment the following lines to deactivate the default embedded database.
#sonar.jdbc.url: jdbc:derby://localhost:1527/sonar;create=true
#sonar.jdbc.driverClassName: org.apache.derby.jdbc.ClientDriver
#sonar.jdbc.validationQuery: values(1)

...

sonar.jdbc.url: jdbc:oracle:thin:@localhost:1521:xe
sonar.jdbc.driverClassName: oracle.jdbc.driver.OracleDriver
sonar.jdbc.validationQuery: select 1 from dual

settings.xml

<profiles>
  ...
  <profile>
    <id>sonar</id>
    <activation>
      <activeByDefault>true</activeByDefault>
    </activation>
    <properties>
      <sonar.jdbc.url>
      jdbc:oracle:thin:@localhost:1521:xe</sonar.jdbc.url>
      <sonar.jdbc.driver>
      oracle.jdbc.driver.OracleDriver</sonar.jdbc.driver>
      <sonar.jdbc.username>sonar</sonar.jdbc.username>
      <sonar.jdbc.password>sonar</sonar.jdbc.password>
      <sonar.host.url>http://localhost:9000</sonar.host.url>
    </properties>
  </profile>
</profiles>

Tuesday, September 06, 2011

SOLUTION: Modify PMD's ShortVariable Rule To Ignore ID Fields

Several people online have asked about the problem of not being able to add exceptions to PMD's ShortVariable rule. Most often, it has been the desire to get PMD to ignore the case where the variable name is 'id'. I was not able to find a solution online, but I worked with the XPath until I came up with a working solution.

  1. In Eclipse, go to Window » Preferences » PMD » Rules configuration » ShortVariable » Edit Rule...
  2. Change the XPath field to:
    //VariableDeclaratorId[(string-length(@Image) < 3) and (not (@Image='id'))]
    [not(ancestor::ForInit)]
    [not((ancestor::FormalParameter) and (ancestor::TryStatement))]
  3. Click Apply » Confirm rebuild » Click Ok » Confirm rebuild
  4. Restart Eclipse
  5. Right-click on the project » PMD » Check code with PMD
    1. You should then see the warning markers disappear on your id fields.