Web Spider
Create a manifest.yml and run the spider.
Creating the manifest.yml
init_spider:
start_urls:
- "https://www.bing.com/search?q=amazon.com:moto g5"
spider_id: bing_search
spiders:
- spider_id: bing_search
allowed_domains:
- bing.com
extractors:
- extractor_type: MetaTagExtractor
extractor_id: bing_search_result
traversals:
- traversal_id: amazon_spider_traversal
selector_type: css
selector_value: ".b_algo h2 a"
next_spider_id: amazon_spider
max_pages: 1
- spider_id: amazon_spider
allowed_domains:
- amazon.in
- amazon.com
extractors:
- extractor_type: CustomContentExtractor
extractor_id: seo_data2
data_selectors:
- selector_id: title
selector: title
selector_type: css
selector_attribute: text
multiple: false
- selector_id: description
selector: "//meta[@name='description']"
selector_type: xpath
selector_attribute: "@content"
multiple: false
- selector_id: og_description
selector: "//meta[@name='og:description']"
selector_type: xpath
selector_attribute: "@content"
multiple: false
- extractor_type: MetaTagExtractor
extractor_id: seo_data
settings:
allowed_domains:
- bing.com
- amazon.in
- amazon.com
download_delay: 0
context:
cti_id: tcl-agriculture
cti_id: tcl-agriculture
Running the Spider
invana-bot --type=web