Creating customizable, interactive data visualizations for Natural Language Processing with the Amazon MXNet team.
People don't understand machine learning data.
Machine learning, and in particular Natural Language Processing, has become a black box that most people just accept as a part of their lives.
"Hey Siri", "Hey Alexa", and "Ok Google" have all become nearly household phrases, but understandingthe data underneath their responses is left to the experts.
...and being able to communicate the importance of these data is even more difficult.
Even though current data visualization methods exist, nothing more robust than (pretty much) the brat rapid annotation tool is currently in production.
Further, web apps like this try to visualize things like twitter sentiment but fail to provide clear context for the content.
This is where Emory NLP is coming in: providing a complete, dynamic, and interactive visualization system for researchers and the public alike.
We need a dynamic and interactive visualization system for Natural Language Processing.
I love graphic design and take a lot of inspiration from robust logo systems, specifically like these below because they are adaptable and, although look different with different data, are inherently unified.
So, from here, I knew that I needed an adaptable visual system but one that could take complex data and allow users to explore those data visually, and visualize differences between different parts of the data through animation, color, and scale.
Instead of viewing word dependencies, parts of speech, and other meta data in a cluttered fashion all at once, I prototyped a depth-separated menu for every word that shows its dependencies, part of speech, and coreferences.
Instead of being confined to traditional 2D scrolling in order to analyze a long list of sentences, I created a 3D sentence scrolling concept that allows users to focus on any given sentence while still maintaining a context from its surrounding content.
Playing with language
What does the structure of grammar look like, and how can you explore it?
In order to solve a fundamental issue with understanding how people learn language and understand grammar, linguistics and NLP researchers often use dependency trees to visualize grammatical relationships between words in a sentence. However, most diagrams are static and offer little context about where the relationships are coming from, or how you can see it through normal text.
I took this problem and created a small prototype in Principle that smoothly transitions a normal sentence into a dependency tree that allows you to explore deeper and focus on individual words. I will soon be implementing this with D3.js.
Perhaps my favorite part about this was the button I created that transitions between "Treeify" and "Textify" because it visually embodies the tranistion between visualization state, something that I consider very important in button and interaction design.
A new sentiment visualization method
How do you "rate" a sentence?
First, we have to look at how the data structures were organized once a sentence was placed through the sentiment analysis algorithm:
From here, we went to the drawing board and took this general strucutre to see how we could map these values into a digestable visualization.
Luckily, we noticed a pattern...
We realized that the sentiment score of each sentence could be represented as a vector.
By multiplying the sentiment score by 255, we could end up producing a corresponding RGB value.
This would now produce the following mappings:
Very negative is very red.
Very neutral is very green.
Very positive is very blue.
Then, using the different weights of each word, we can translate that into either opacity or relative scale.
All of this put together gives us a dynamic, adaptable visualization system that produces some interesting results: