Discovery Processing Mode

Last Updated: May 14, 2014 05:44PM EDT
What is Discovery processing mode?
Discovery processing mode is the Semantria discovery tool that outputs the most commonly used words and phrases in the dataset to get an idea of what the dataset is about and how best to approach categorizing it. In essence, it provides a 'big picture' view of the dataset.

The Semantria output in Discovery processing mode delivers the most commonly used facets (nouns) from the dataset, a count for the number of positive, negative, and neutral sentiment occurrences, the attributes for each facet, and the attribute count. 

Does Semantria count a collection of documents as a single document or does it count every document within the Discovery analysis?
Semantria counts every document within the collection and subtracts this number from the user’s API balance. The same is applicable for a batch of documents for documents processing mode. The only difference is the output, which is different for collection and document processing modes.
What is the recommended size of Discovery analysis?
It entirely depends on the content. If the content is about the similar topics that appear in every document, a numerous amount of facets and attributes will be extracted. If the text is logically the same, we recommend to use the largest collection possible. By default Semantria limits size of collection to 100 documents per Discovery analysis.
Why I didn’t extract any facets from my Discovery analysis?
Discovery analyses are designed to work with a large amount of data. Facets and attributes will appear in the output result if it was mentioned in the source collection at least twice. Any nouns or phrases which appear only once won’t be extracted by the Semantria service. The same principal is applicable for attributes related to the facets.
For Discovery processing mode all output is aggregating across the documents and don't have any obvious reference to the original documents within the Discovery analysis. Facets are nouns, attributes are adjectives for the nouns aggregated across the documents of Discovery analysis.
Where is the sentiment score for facets?
Discovery mode does not provide a sentiment score for facets because it's technically difficult to calculate the sentiment for the same pair of facets/attributes across all the documents.
For example, some documents may have a negative mention of a "ripe apple" while other mentions of the "ripe apple" may have varying degrees of positive or neutral mentions. 
Instead, Semantria counts the number of positive, neutral and negative mentions of the facet and responds with “positive_count”, “neutral_count” and “negative_count” fields accordingly.
