I have noticed that Gephi gives an error when importing a NodeXL file that was exported by DiscoverText. The error is “unexpected exception: Java Language Runtime exception” (Note: see below for the entire error message).
The data is Twitter tweets and hashtags top metadata that are in the Arabic language. I also can't open CSV file in Excel. Why is this happening?
The Gephi issue is certainly on their end, and unfortunately, we cannot do anything about it from DiscoverText as it is data dependent (not necessarily because they are tweets in Arabic, but, rather a user has added control characters to the tweet itself).
All data exported from DiscoverText is in UTF-8 format, and if there is a character that Gephi cannot process, then an error like that could happen. To get this to work, we suggest some sort of pre-processing on the file to remove bad characters. A search on Google came up with a good explanation and some code for handling this:
As for not being able to open the exported CSV file in Excel, we've found that Excel by default opens files in a different code page other than UTF-8. Please see this FAQ about encoding errors for more information about this topic:
Note: Instructions to export a bucket to a NodeXL file.
javax.xml.stream.XMLStreamException: ParseError at [row,col]:[76092,42]
Message: An invalid XML character (Unicode: 0xb) was found in the element content of the document.
at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(Unknown Source)