Thursday, March 3, 2011

Java: CharBuffer vs. char[]

Is there any reason to prefer a CharBuffer to a char[] in the following:

CharBuffer buf = CharBuffer.allocate(DEFAULT_BUFFER_SIZE);
while( in.read(buf) >= 0 ) {
  out.append( buf.flip() );
  buf.clear();
}

vs.

char[] buf = new char[DEFAULT_BUFFER_SIZE];
int n;
while( (n = in.read(buf)) >= 0 ) {
  out.write( buf, 0, n );
}

(where in is a Reader and out in a Writer)?

From stackoverflow
  • If this is the only thing you're doing with the buffer, then the array is probably the better choice in this instance.

    CharBuffer has lots of extra chrome on it, but none of it is relevant in this case - and will only slow things down a fraction.

    You can always refactor later if you need to make things more complicated.

    coppro : That's a bad philosophy. Get it right the first time.
    Ed Swangren : This is right for the current requirement. Requirements change.
    Michael Rutherfurd : Using a standard implementation (dare I say pattern :-) ) results in less error-prone code. Does't mean that using an array is wrong or buggy, it is just more likely to be so.
    Bill Michell : For the record, I disagree with "get it right first time". Get it right *for this requirement* the first time, while having confidence that when the requirement changes I can change things then, is much better provided you can make the environment support that philosophy.
  • No, there's really no reason to prefer a CharBuffer in this case.

    In general, though, CharBuffer (and ByteBuffer) can really simplify APIs and encourage correct processing. If you were designing a public API, it's definitely worth considering a buffer-oriented API.

  • The CharBuffer version is slightly less complicated (one less variable), encapsulates buffer size handling and makes use of a standard API. Generally I would prefer this.

    However there is still one good reason to prefer the array version, in some cases at least. CharBuffer was only introduced in Java 1.4 so if you are deploying to an earlier version you can't use Charbuffer (unless you role-your-own/use a backport).

    P.S If you use a backport remember to remove it once you catch up to the version containing the "real" version of the backported code.

  • I think that CharBuffer and ByteBuffer (as well as any other xBuffer) were meant for reusability so you can buf.clear() them instead of going through reallocation every time

    If you don't reuse them, you're not using their full potential and it will add extra overhead. However if you're planning on scaling this function this might be a good idea to keep them there

    Jonathan : You can reuse arrays.
    James Schek : The full potential of buffers is the direct-buffers and the ability to change data representations easily. You can use ByteArray.asTYPE() to on-the-fly convert bytes to numbers or strings. You can also change the byte order as well.
  • I wanted to mini-benchmark this comparison.

    Below is the class I have written.

    The thing is I can't believe that the CharBuffer performed so badly. What have I got wrong?

    EDIT: Since the 11th comment below I have edited the code and the output time, better performance all round but still a significant difference in times. I also tried out2.append((CharBuffer)buff.flip()) option mentioned in the comments but it was much slower than the write option used in the code below.

    Results: (time in ms)
    char[] : 3411
    CharBuffer: 5653

    public class CharBufferScratchBox
    {
        public static void main(String[] args) throws Exception
        {
            // Some Setup Stuff
            String smallString =
                    "1111111111222222222233333333334444444444555555555566666666667777777777888888888899999999990000000000";
    
            StringBuilder stringBuilder = new StringBuilder();
            for (int i = 0; i < 1000; i++)
            {
                stringBuilder.append(smallString);
            }
            String string = stringBuilder.toString();
            int DEFAULT_BUFFER_SIZE = 1000;
            int ITTERATIONS = 10000;
    
            // char[]
            StringReader in1 = null;
            StringWriter out1 = null;
            Date start = new Date();
            for (int i = 0; i < ITTERATIONS; i++)
            {
                in1 = new StringReader(string);
                out1 = new StringWriter(string.length());
    
                char[] buf = new char[DEFAULT_BUFFER_SIZE];
                int n;
                while ((n = in1.read(buf)) >= 0)
                {
                    out1.write(
                            buf,
                            0,
                            n);
                }
            }
            Date done = new Date();
            System.out.println("char[]    : " + (done.getTime() - start.getTime()));
    
            // CharBuffer
            StringReader in2 = null;
            StringWriter out2 = null;
            start = new Date();
            CharBuffer buff = CharBuffer.allocate(DEFAULT_BUFFER_SIZE);
            for (int i = 0; i < ITTERATIONS; i++)
            {
                in2 = new StringReader(string);
                out2 = new StringWriter(string.length());
                int n;
                while ((n = in2.read(buff)) >= 0)
                {
                    out2.write(
                            buff.array(),
                            0,
                            n);
                    buff.clear();
                }
            }
            done = new Date();
            System.out.println("CharBuffer: " + (done.getTime() - start.getTime()));
        }
    }
    
    Alnitak : On my 2007 MacBookPro with Java 1.6 the second version is only(!) 35% slower - 2700ms vs 2000ms.
    Ron Tuffin : Yea. My times above were using 1.5 the times for I get for 1.6 are faster. About 35% as you (@Alnitak) report.
    Chris Conway : Why write(buf.array(),0,n) instead of write(buf.flip())?
    Alnitak : Because StringWriter.write(Buffer) doesn't exist.
    Chris Conway : My bad. Why not append(buf.flip())?
    Alnitak : because Write.append() doesn't exist either - the .append() method is only in the StringWriter subclass.
    Chris Conway : Writer implements Appendable since JDK1.5
    Alnitak : so it does - my bad.
    Ron Tuffin : @Chris Conway: I used write(buff.array(),0,n) because I wanted to eliminate as many differences as possible between the two.
    Ron Tuffin : ok so I replaced out2.write(buff.array(),0,n); with out2.append((CharBuffer)buff.flip()); that made the time comparisons worse an increase of 135% - Bah! Go with a char[] it is clearly faster in this case. :)
    James Schek : Your micro-benchmark measures too many things. StringWriter is allocated without an argument so it has to resize itself. StringWriter is backed by StringBuffer and defaults to 16. Try allocating it with the argument string.length().
    Ron Tuffin : Thanks @James Schek. I have done that and updated the 'answer'. Much better performance overall.
  • The difference, in practice, is actually <10%, not 30% as others are reporting.

    To read and write a 5MB file 24 times, my numbers taken using a Profiler. They were on average:

    char[] = 4139 ms
    CharBuffer = 4466 ms
    ByteBuffer = 938 (direct) ms
    

    Individual tests a couple times favored CharBuffer.

    I also tried replacing the File-based IO with In-Memory IO and the performance was similar. If you are trying to transfer from one native stream to another, then you are better off using a "direct" ByteBuffer.

    With less than 10% performance difference, in practice, I would favor the CharBuffer. It's syntax is clearer, there's less extraneous variables, and you can do more direct manipulation on it (i.e. anything that asks for a CharSequence).

    Benchmark is below... it is slightly wrong as the BufferedReader is allocated inside the test-method rather than outside... however, the example below allows you to isolate the IO time and eliminate factors like a string or byte stream resizing its internal memory buffer, etc.

    public static void main(String[] args) throws Exception {
        File f = getBytes(5000000);
        System.out.println(f.getAbsolutePath());
        try {
            System.gc();
            List<Main> impls = new java.util.ArrayList<Main>();
            impls.add(new CharArrayImpl());
            //impls.add(new CharArrayNoBuffImpl());
            impls.add(new CharBufferImpl());
            //impls.add(new CharBufferNoBuffImpl());
            impls.add(new ByteBufferDirectImpl());
            //impls.add(new CharBufferDirectImpl());
            for (int i = 0; i < 25; i++) {
                for (Main impl : impls) {
                    test(f, impl);
                }
                System.out.println("-----");
                if(i==0)
                    continue; //reset profiler
            }
            System.gc();
            System.out.println("Finished");
            return;
        } finally {
            f.delete();
        }
    }
    static int BUFFER_SIZE = 1000;
    
    static File getBytes(int size) throws IOException {
        File f = File.createTempFile("input", ".txt");
        FileWriter writer = new FileWriter(f);
        Random r = new Random();
        for (int i = 0; i < size; i++) {
            writer.write(Integer.toString(5));
        }
        writer.close();
        return f;
    }
    
    static void test(File f, Main impl) throws IOException {
        InputStream in = new FileInputStream(f);
        File fout = File.createTempFile("output", ".txt");
        try {
            OutputStream out = new FileOutputStream(fout, false);
            try {
                long start = System.currentTimeMillis();
                impl.runTest(in, out);
                long end = System.currentTimeMillis();
                System.out.println(impl.getClass().getName() + " = " + (end - start) + "ms");
            } finally {
                out.close();
            }
        } finally {
            fout.delete();
            in.close();
        }
    }
    
    public abstract void runTest(InputStream ins, OutputStream outs) throws IOException;
    
    public static class CharArrayImpl extends Main {
    
        char[] buff = new char[BUFFER_SIZE];
    
        public void runTest(InputStream ins, OutputStream outs) throws IOException {
            Reader in = new BufferedReader(new InputStreamReader(ins));
            Writer out = new BufferedWriter(new OutputStreamWriter(outs));
            int n;
            while ((n = in.read(buff)) >= 0) {
                out.write(buff, 0, n);
            }
        }
    }
    
    public static class CharBufferImpl extends Main {
    
        CharBuffer buff = CharBuffer.allocate(BUFFER_SIZE);
    
        public void runTest(InputStream ins, OutputStream outs) throws IOException {
            Reader in = new BufferedReader(new InputStreamReader(ins));
            Writer out = new BufferedWriter(new OutputStreamWriter(outs));
            int n;
            while ((n = in.read(buff)) >= 0) {
                buff.flip();
                out.append(buff);
                buff.clear();
            }
        }
    }
    
    public static class ByteBufferDirectImpl extends Main {
    
        ByteBuffer buff = ByteBuffer.allocateDirect(BUFFER_SIZE * 2);
    
        public void runTest(InputStream ins, OutputStream outs) throws IOException {
            ReadableByteChannel in = Channels.newChannel(ins);
            WritableByteChannel out = Channels.newChannel(outs);
            int n;
            while ((n = in.read(buff)) >= 0) {
                buff.flip();
                out.write(buff);
                buff.clear();
            }
        }
    }
    
  • You should avoid CharBuffer in recent Java versions, there is a bug in #subsequence(). You cannot get a subsequence from the second half of the buffer since the implementation confuses capacity and remaining. I observed the bug in java 6-0-11 and 6-0-12.

0 comments:

Post a Comment