Support Center

Website Content Crawling

Last Updated: May 07, 2014 04:56PM EDT
Diffbot/Semantria

Semantria does not natively collect data for processing; the main focus of the product is to conduct sentiment analysis and text analytics.

However, Semantria does work very closely with Diffbot, a developer of machine learning and computer vision algorithms and public APIs for web scraping. Together, Diffbot and Semantria can return analytical data on a web page with a single API call.

Diffbot’s Automatic API takes your given URLs of any articles or blog posts and uses computer vision to automatically identify and extract article specific data, such as title, author, publishing date, full text, images, and videos. From there, the data can either be sent to you, or to Semantria for analysis.

When sending it to Semantria for analysis, you can also specify which configuration ID you’d like to be used.

Click here to learn more about Diffbot's Automatic API. 

By default, when a URL is given to Diffbot and the resulting scraped content is sent to Semantria analysis, Diffbot and Semantria will return the following outputs:
 
  • id
  • tag
  • summary
  • language
  • language_score
  • sentiment_score
  • sentiment_polarity
  • auto_categories
    • title
    • type
    • strength_score
    • categories​
  • themes
    • title
    • is_about
    • strength_score
    • sentiment_score​
    • sentiment_polarity
  • entities
    • title
    • type
    • evidence
    • confident​
    • entity_type
    • sentiment_score
    • sentiment_polarity
    • themes
      • title
      • is_about
      • strength_score
      • sentiment_score​
      • sentiment_polarity


​For more information on these outputs, please visit our Developer Portal.
support@semantria.com
http://assets2.desk.com/
false
desk
Loading
seconds ago
a minute ago
minutes ago
an hour ago
hours ago
a day ago
days ago
about
false
Invalid characters found
/customer/en/portal/articles/autocomplete