Job Description: Web Scrapper - Python

Hours: Monday – Friday, up to 45 (some hours outside of this as required)

Work hours: IST shift (3:30 pm to 12:30 am).

Location: Cerebrum IT Park, Kalyani Nagar, Pune

Position: Permanent position, with a three-month probation period

Salary: As per industry standards

About the Company

Valasys Media is a Global Integrated Marketing and Sales process outsourcing company that specializes in helping companies to build sales pipeline with qualified opportunities and reduce their sales cycle for their products/services portfolio. As part of our capability, we also help create market visibility, build awareness, and establish business relationships in new markets.

Job Brief

We are looking for a Web Scrapper-Python to help us to expand and optimize our data as well as optimize data flow. The ideal candidate will be responsible for extracting and ingesting data from websites/URLs using web crawling/Scrapping tools. In this role you will own the creation process of these tools, services, and workflows to improve crawling/Scrapping of data and management of database.

To do this job successfully, you need exceptional skills in programming and web. Knowledge of data science and software engineering candidate will have added advantage. Your ultimate goal will be maintained dataflow with scraping, crawling and cleaning data as per requirement.

Key Skills: Web Scrapping, Web Crawling, Web and Windows Automation, Python/R, Selenium, NLP, Data Extraction, SQL/No SQL, OpenCV, Auto IT, PyAutoGUI

Key Management Areas of Responsibility

Program and apply your knowledge set to fetch data from multiple online sources, cleanse it.
Develop application frameworks for automating and maintaining constant flow of data from multiple sources.
Design, build web crawlers to scrape data and URLs by using Python modules [scrapy, selenium, requests, Beautiful Soup, splash, etc.].
Create crawlers for all types of websites irrespective of the technical roadblocks.
Manage the crawlers to overcome technical challenges like IP ban, geolocation ban, captcha and bot blocking services.
Design scrapy pipelines to connect the crawler output to MySQL database.
Integrate the data crawled and scraped into our databases.
Build and maintain high quality reusable code.
Writing product descriptions, brand support material etc.
Automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability.

Professional Skills & Qualification

Proven experience as Web Scrapper/Crawler or similar role.
Have strong understanding and working knowledge of web crawlers, web scrapers and other automation tools, to help browse the web content.
Knowledge of web scraping and tools.
Strong knowledge of any of multiple open-source and proprietary scraping frameworks available.
Hands-on-experience with SQL/NO-SQL (MySQL/ Postgres/Cassandra /MongoDB).
Good knowledge and coding experience in one or more programming languages such as Python, Java, JavaScript.
Experience of creating scrapy spiders for websites with Captcha, IP ban, geolocation ban, Cloudflare / Distil / Imperva firewalls, sites required login to access data, Dynamic websites loading through JS / REST API / Graphql etc.
Knowledge of Object-oriented programming.
Experience with applications designed to display archived web content.
Experience with AWS cloud services (EC2).
Python Tech stack (Python libraries – scrapy, requests, Urllib, Beautiful soup, splash, Selenium, pandas).
2-4 years’ experience with a Bachelor’s Degree in Computer Science, Engineering, Technology or related field required.