Lynda.com Scraper

Using Javascript and Python

Posted on 2016-02-22 21:46:00 in webdev, javascript, python

See it on Github

Lynda.com has a pretty good collection of videos that teach you how to do things. As a BYU student, I have free access to all of those videos. They have been very helpful in teaching me about web development.

The problem is that only 3 computers in the library can access Lynda. I don't really like the chairs or the location of the room that those computers are in. I wanted to be able to access Lynda from home.

The solution I found isn't as great as I want it to be yet, but it works. It requires just a few steps:

Run this script on the page that includes the videos you want to download:

var i = 0;

function clickPlay(){
    var playButtons = $('.video-cta');
    playButtons[i].click();
    i++;
}

function playVideos() {
    setTimeout(function () {
        clickPlay();
        playVideos();
    }, 5000); // Change to speed up or slow down time interval
}

playVideos();

While that runs, catch the HTTP requests being made. (I used "HTTP Trace") Copy the output from HTTP Trace into requests.txt. (a blank text file used by the python script)

Run this python script, which parses the requests into nice urls:

urls = []
output = ""
f = open("requests.txt", "r")

try:
    for line in f:
        if any( ["GET https://files" in line, "GET http://files" in line] ):
            pieces = line.split(" ")
            urls.append(pieces[1])

    for url in urls:
        output += url

except:
    print "Could not open file"
finally:
    f.close()

try:
    file = open("urls.txt", "w+")
    file.write(output)
    print "Successfully parsed requests.txt and wrote urls to urls.txt"
except:
    print "Could not open file or write to file"
finally:
    file.close()

Now open up the "DownloadThemAll!" extension in Firefox. (Or pick a different one -- there are various)

Open up the "DownloadThemAll!" Manager and import your text file containing the urls (urls.txt).

Now sit back and relax while Firefox downloads all your videos.

See it on Github