Important, this is not a tutorial.

You also could try to check an new Play wright its new end to end testing device.
That should be nice for anny scrapy purposes.
Looks nice and prommising. It was recomenended to me by some of people fom area.  

Being able to extract a data fragment from any desired websites it’s a nice skill to have especially if you must deal with larger data fragments that are scatter all around the site. I’m not an expert but I happily will share with you what I wish to know before I’ve spent almost 3 hours trying to figure out how stuff works. First for this task I’ve used python but next time I’ll probably use java just to have a bit more java apps on my profile. For python you could use an Scrapy nice framework, or Selenium, or Robot framework. If the content of the website is dynamic, and you must w8 for things to be rendered I would choose selenium. Otherwise, if your content is static and loaded straight like on most old websites use scrapy. Tips for using Scrapy. You can login into w page with it easily. Look at presented snipped of code.

Scrapy form trick

class TheScraper(Spider):
    name = 'simp'
    start_urls = ['']

    def parse(self, response, **kwargs):
        yield FormRequest(url=self.start_urls[0],
                              'email': 'login',
                              'password': 'pass'

    def scrape_page(self,response):
        print("Logged in !")
        #Do stuff
        url = ''
        yield scrapy.Request(url=url, callback=self.parse_100)

    def parse_100(self, response):
Also, for selenium there is a nice thing called headless mode. This is when you’re sure your thing is working as it should and what you really want is to just hide the output and be able to get just the data.

Selenium headles mode

def getOptions():
    options = Options()
    options.headless = False
    return options

browser = webdriver.Chrome(options=getOptions())

Selenium basic setup and Xpath Selectors

#Getting an Xpath selector of the element
#Import those libraies -> pip install selenium ....
from selenium import webdriver
from import By
from import expected_conditions as EC
from import WebDriverWait

#Download and throw chrome webdriver to project repository.
if __name__ == "__main__":
	#Load driver defaults
	#browser = webdriver.Chrome()

	#Or load with options ( headles )

	#Open desired page

WebDriverWait – Will make browser wait, so our dinamicaly rendered content is there.
EC expected_condition – Is needed to show to selenium what is the condition you want to see.
By  – allows to use XPATH or CSS selectors to find our target.

You probably will be able to find more on this website about xpath selectors.

Selecting one

#Let's try to select one attributte
	main_title = 
	#For the last part you could use anny attribute you want like

Selecting multiple

            # List of the videos for process
            the_list = WebDriverWait(browser, 10).until(
                EC.presence_of_all_elements_located((By.XPATH, ".//a[contains(@class,'block')]")))

            for x in range(len(the_list)):
                title = the_list[x].find_element(By.XPATH, ".//div").get_attribute('innerText').split("\n")[0]
                file_title = sanitize(title)

                link = the_list[x].get_attribute('href')
                duration = the_list[x].find_element(By.XPATH, ".//div").get_attribute('innerText').split("\n")[1][1:]

        except Exception as e: