Feeds Spider

Create a manifest.yml and run the spider.

cti_id: news.ycombinator.com
init_spider:
  spider_id: default
  start_urls:
    - https://news.ycombinator.com/rss
spiders:
- spider_id: default
  allowed_domains:
  - news.ycombinator.com
  iterator: xml
  itertag: item
  extractors:
  - extractor_id: stories
    extractor_type: CustomContentExtractor
    data_selectors:
    - selector_id: title
      selector: title
      selector_type: xpath
      selector_attribute: text()
      data_type: StringField
    - selector_id: link
      selector: link
      selector_type: xpath
      selector_attribute: text()
      data_type: StringField
    - selector_id: published_date
      selector: pubDate
      selector_type: xpath
      selector_attribute: text()
      data_type: StringField
settings:
  allowed_domains:
  - "news.ycombinator.com"
  download_delay: 0
context:
  author: https://github.com/rrmerugu
  description: Crawler that scrapes ycombinator news


Running the Spider

invana-bot --type=rss