Support Center

Name is required.
Email address is required.
Invalid email address
Answer is required.
Exceeding max length of 5KB

User Defined Categories

Edgar Fenn Jul 28, 2015 08:22PM EDT

Hi, I am selecting terms for a user defined category, and need some more information about how these terms are used by Semantria to identify text. For example, I have an Acquisition and Partnership category (referring to business), and one of my terms is "merger." For the best identification, should I instead identify the Wikipedia article which discusses mergers, and so use instead "mergers_and_acquisition" to yield better results? Or, should I instead select a Wikipedia subcategory which Semantria already recognizes, such as "Business: Strategic_alliances"? Which of these three options would yield the most favorable results?
Thanks

Up 0 rated Down
Ethan Jul 29, 2015 12:57PM EDT Semantria Agent

Hi Edgar,

Semantria categories uses a categorization model (aka Concept Matrix) to support “fuzzy” categorization of your document set based on the terms that you provide. With a traditional boolean query, you specify exactly which terms must or must not appear for a matching document. With the Concept Matrix, we can support conceptual matches that return documents from your set related to your query, even if different terms are used. For example, a query for ‘food’ in a boolean search engine would return only articles using the specific word ‘food’, but in a category you’d also get results that discuss pizza, sandwiches, and other phrases we associate with food.

As for best practices when specify your list of terms:
Although you can create long, complicated category queries, we do not recommend the practice. Besides being an issue for maintenance, results generally degrade with complexity. A better solution is often to break very broad topics into smaller subtopics, and map these subtopics together in a consuming application. That is, rather than having a single query for all of ‘Business’, you may have better results pulling in ‘human resources’, ‘regulatory compliance’, etc. separately.

Ad hoc analysis of results is appropriate during rapid, informal development of category queries. However, when you’re looking for optimal performance and robust results, formal testing is encouraged. By annotating documents with the categories you wish to see each match, then calculating precision and recall for each query, you can use good statistical data to guide decisions. With the annotations you can make small changes to query list of terms and weightings and see whether your results improve. You can also identify categories the concept matrix has difficulty with, and address those queries specially (either by breaking the category into more specific subtopics, or falling back to more traditional query based topics). Precision and recall are calculated as follows:

Precision = # of correct matches / (# of correct matches + # of incorrect matches)
Recall = # of correct matches / (# of correct matches + # of matches missed )

With regards to using the underscore syntax for a category term, please note that when the concept matrix is given a term/phrase, it matches both the phrase form as well as the individual words. Thus ‘power plant’, while matching stories about electric generation most strongly, may also pull in documents about plant life. In most queries the individual words in a phrase are related and contribute positively. But in cases where the individual words mean something different on their own, underscore instructs the engine to only use the phrase form. Thus ‘power_plant’ will not match documents about flowers at all.

I hope that addressed your questions. Please let us know if you have further questions. Thanks.

Regards,

Ethan Thong
Support Engineer

Up 0 rated Down
Ethan Aug 04, 2015 12:37PM EDT Semantria Agent

Hi Edgar,

We want to check in with you to see if the information we provided answered all your questions or if you have further questions. Please let us know. Thank you.

Regards,

Ethan Thong
Support Engineer

Up 0 rated Down
Ethan Aug 06, 2015 12:42PM EDT Semantria Agent

Hi Edgar,

We want to check in with you to see if the information we provided answered all your questions or if you have further questions. Please let us know so we can close the ticket or provide you with more information. Thank you.

Regards,

Ethan Thong
Support Engineer

Up 0 rated Down
Ethan Aug 10, 2015 11:37AM EDT Semantria Agent

Hi Edgar,

Since we have not heard back from you we will assume that all of your question for categories and will proceed to closing this ticket. Should you have any further questions, you can open another ticket by contacting support@semantria.com. Thank you.

Regards,

Ethan Thong
Support Engineer

This question has received the maximum number of answers.

support@semantria.com
http://assets2.desk.com/
false
desk
Loading
seconds ago
a minute ago
minutes ago
an hour ago
hours ago
a day ago
days ago
about
false
Invalid characters found
/customer/en/portal/articles/autocomplete