Web Spider

Create a manifest.yml and run the spider.

Creating the manifest.yml

init_spider:
  start_urls:
  - "https://www.bing.com/search?q=amazon.com:moto g5"
  spider_id: bing_search
spiders:
- spider_id: bing_search
  allowed_domains:
  - bing.com
  extractors:
  - extractor_type: MetaTagExtractor
    extractor_id: bing_search_result
  traversals:
  - traversal_id: amazon_spider_traversal
    selector_type: css
    selector_value: ".b_algo h2 a"
    next_spider_id: amazon_spider
    max_pages: 1
- spider_id: amazon_spider
  allowed_domains:
    - amazon.in
    - amazon.com
  extractors:
  - extractor_type: CustomContentExtractor
    extractor_id: seo_data2
    data_selectors:
    - selector_id: title
      selector: title
      selector_type: css
      selector_attribute: text
      multiple: false
    - selector_id: description
      selector: "//meta[@name='description']"
      selector_type: xpath
      selector_attribute: "@content"
      multiple: false
    - selector_id: og_description
      selector: "//meta[@name='og:description']"
      selector_type: xpath
      selector_attribute: "@content"
      multiple: false
  - extractor_type: MetaTagExtractor
    extractor_id: seo_data
settings:
  allowed_domains:
    - bing.com
    - amazon.in
    - amazon.com
  download_delay: 0
context:
   cti_id: tcl-agriculture
cti_id: tcl-agriculture


Running the Spider

invana-bot --type=web