Can multiple docs be summarized into one? If so, what are the best practices for it?
At present, the engine will not automatically summarize multiple documents with a single summary. They need to be summarized one by one. There are two approaches:
o Glue the multiple documents together as a single document and summarize it. Might unfairly weight the first document unless there is very strong correlation within the content for multiple docs. Keep in mind the maximum document length while gluing the documents.
o Summarize each document individually and glue together the summaries. Might end up with an odd summary if there is not strong correlation between the content.
What is the optimal summary length for a 1 page, 2 page, and ½ page documents?
The default summary length is 3 sentences, and this is good for a general news article (couple of pages). If summarizing something that is 12 pages long, the customer may want to increase the length slightly. This will also depend on how diverse the content is.
How do customers configure the summary length in customization?
This is mentioned in the Semantria Frontend API document as “summary_limit” on pages 6-7.
Can you please explain more about what goes on behind the scenes of the system? What NLP algorithms does it use? Does it implement some kind of published NLP academic multi documents summarization algorithms / system?
When Salience analyzes a document, it performs the following NLP processes:
Tokenization:breaking the document up into individual tokens (words, punctuation, etc)
Tagging:Determining the part-of-speech tag for each token
Chunking:Grouping sets of tokens into grammatical chunks
Naturally, the techniques we’ve developed to implement these NLP processes are our intellectual property. But that set of processes gives us the fundamental understanding of the content, which we then use to extract entities, detect sentiment, extract themes, generate summaries, etc.
All of this occurs on a document-by-document basis. Summaries are generated for individual documents; Salience does not do any merging of summaries across documents within a session. Even in the collection level functionality provided in Salience Five, there is no multi-document summarization.