Introduction
I have been adding automated tests in projects that I have written at work. One of these projects outputs CEF formatted messages to syslog. The test required that /var/log/messages be read and compare the CEF message with the known good message. I ran into a problem while reading the file.
What Problem?
Even a novice Java developer knows that reading a text file is no big deal, just chain the right streams together and the contents will be easily extracted. I wanted to read the file line by line so I chained java.io.FileReader inside a java.io.BufferedReader. Everything should work, right? Indeed each line was dutifully read and was displaying on my log except the last line, the exact line that I was trying to compare. I found the root of the cause in the javadocs themselves. Here is the excerpt from the BufferedReader’s javadoc:
Reads a line of text. A line is considered to be terminated by any one of a line feed (‘\n’), a carriage return (‘\r’), or a carriage return followed immediately by a linefeed.
What To Do, What To Do
I obviously needed to modify the definition of what the end of a line was by java.io.BufferedReader. My first thought was to extend BufferedReader so I started the checklist. The class is not final but the buffer underneath was not visible. This is a problem because the only way to get to the buffer is going through the java.io.Reader interface. So my initial thought was to override the readLine method and use the read() method to get the contents one by one to parse each line. That seemed a waste to make repeated method calls just to access a buffer. It would be better to just read the buffer directly and increment a counter. This meant that I would have to basically recreate BufferedReader. “This should be no problem. I am an experienced developer. This will be easy!” I thought to myself.
A lesson Learned
Four days and 26 automated tests later, the beast was done. While it is more efficient to to use a counter on an array instead of repeated method calls, it did not make good time management sense. Remember this was for an automated test, not code destined to be delivered to the customer. A second or two extra per test run does not matter in the long run. Getting it done in a quarter of the time so tests can be run does matter. In my mind, the sooner a test is written, the more time can be saved and time is money.
The Solution
In theory, the concept is simple, the buffer is filled from the inner reader. If the buffer is full, it is cleared and another set of data is read from the inner reader. This repeats until the inner reader returns -1. The main methods are skip(long n), read(char[] data, int offset, int length), readLine() and fillBuffer().
Skip(long n)
@Override public long skip(long n) throws IOException { long numSkipped = 0; long leftToSkip = n; int lenRead = 0; while(leftToSkip > 0 && lenRead != -1) { if((offset + leftToSkip) endIndex) { lenRead = fillBuffer(); if(lenRead != -1) { int amountBuffered = endIndex - offset; long amountToSkip = (amountBuffered < leftToSkip)? amountBuffered:leftToSkip; offset += amountToSkip; numSkipped += amountToSkip; leftToSkip -= amountToSkip; } } } return numSkipped; }
Read(char[] data, int offset, int length)
@Override public int read(char[] cbuf, int off, int length) throws IOException { int totalRead = 0; boolean noReads = true; int targetOffset = off; int newTargetOffset = off + length; int readLen = 0; int lenToCopy = 0; int leftToCopy = length; while(targetOffset < newTargetOffset && readLen != -1) { if((offset + leftToCopy) < endIndex) { readLen = fillBuffer(); if(readLen != -1) { int amountBuffered = endIndex - offset; lenToCopy = (amountBuffered < leftToCopy)? amountBuffered:leftToCopy; System.arraycopy(buffer, offset, cbuf, targetOffset, lenToCopy); noReads = false; offset += lenToCopy; targetOffset += lenToCopy; leftToCopy -= lenToCopy; totalRead += lenToCopy; } } } if(noReads) { totalRead = -1; } return totalRead; }
ReadLine()
public String readLine() throws IOException { StringBuilder line = new StringBuilder(); boolean foundCR = false; boolean foundLinefeed = false; boolean foundBoth = false; int readLen = 0; final char LINEFEED = '\n'; final char CR = '\r'; if(offset == endIndex) { readLen = fillBuffer(); } while(!(foundCR || foundLinefeed || foundBoth) && readLen != -1) { if(buffer[offset] == CR) { foundCR = true; offset++; if(offset == endIndex) { readLen = fillBuffer(); if(readLen != -1) { if(buffer[offset] == LINEFEED) { foundBoth = true; offset ++; } } } else if(buffer[offset] == LINEFEED) { foundBoth = true; offset++; } } else if(buffer[offset] == LINEFEED) { foundLinefeed = true; offset ++; } else { line.append(buffer[offset]); offset++; } if(offset == endIndex) { readLen = fillBuffer(); } } if(line.length() == 0) { return null; } return line.toString(); }
FillBuffer()
private int fillBuffer() throws IOException { int length; int lenRead = 0; long newOffset = offset + buffer.length; if(newOffset >= endIndex) { moveLeftoverToBeginning(); endIndex = endIndex - offset; offset = 0; length = bufferSize - endIndex; lenRead = in.read(buffer, endIndex, length); if (lenRead != -1) { endIndex += lenRead; } else { //endIndex = offset; } } else if(newOffset < endIndex) { lenRead = 0; } return lenRead; }
Conclusion
In this blog entry, a custom BufferedReader is discussed. The reader includes EOF as a line terminator. This is to facilitate verifying the output of a CEF formatted syslog message. The link to see the rest of this BufferedReader and its tests, download the Maven project via git at https://github.com/darylmathison/buffered-reader-example.