Keith Devens .com |
Sunday, October 12, 2008 | ![]() |
| The very name "selection" implies that you're choosing between two or more variants. So that means that the end... – Dr. Walter Veith | ||
|
| ← Flash is EVIL | Life in Iraq → |

Adam Langley (http://www.imperialviolet.org) wrote:
Keith (http://www.keithdevens.com/) wrote:
Personally, I would be far happier if it stored them internally as byte arrays and only used the charactor encoding when doing string operations on them.
Absolutely. Perl and Python seem to just store whatever in the string and don't generally make you worry about it until you want to get stuff out of it, which seems like the way to go. I question however, because Java seems to be the first language to have considered Unicode from the start, while most other languages such as Perl and Python have it tacked on like they do (Though they seem to make me worry about it less than Java does - figure that.) My point is that maybe Perl and Python would have done it more like Java had they considered Unicode from the start, but I'm not sure.
Here's how Tim Bray handled an implementation of Unicode strings for Java. See my wiki page on Unicode for more from him and otherwise. Feel free to add your own links 
Sam Newman (http://www.magpiebrain.com) wrote:
So Java strings are completely useless for storing binary data.
Well yes. Its a string, of character data. I would no more use a String for storing binary data than I would use a byte array for storing a String for display purposes. Java Strings are desinged to be seen, and as such the String class and its associated classes are written and designed to make the process of using these strings for display purposes as easy as possible, so I am not suprised at all at its behaviour.
This means that while in every other language I was just able > to store binary data inside a string - which meant that I
could use normal input and output functions, as well as the
automatic memory management built-in - in Java I'm going to
have to store my data as byte arrays, do my own memory
management, and I or any other user of the library will
probably have to worry about converting back and forth
between bytes and characters. Yet again, Java makes my life
harder than it should be.
Im usnure what you mean by memory management here. In anycase the main thrust of my comment is that you are using Strings or something other than the purpose for which they were written. If you want to store some binary data, you could use a byte[] of course, but you'd be better off using a ByteBuffer (part of the nio package) or for earlier versions just use a ByteArrayInputStream.
Feel free to post a comment below. Please see my comment policy.
Formatting Rules (No HTML):
Generated in about 0.174s.
(Used 8 db queries)

The problem is that Java's concept of a string is an array of charactors and it stores them internally as Unicode. It then translates them to a given encoding as they go in and out.
Personally, I would be far happier if it stored them internally as byte arrays and only used the charactor encoding when doing string operations on them. That way you could treat them just as binary if you set a charactor encoding of binary' and splices etc would work on byte numbers. If you then tagged it as utf-8' splices would start using letter indexes insted.
(and just as a warning, Java can't handle unsigned byte arrays. The only Java programming I've ever done involved doing crypto and signed/unsigned problems convinced me never to use Java again).
AGL