PowerShell: Download Movies from YouTube with Invoke-Webrequest and youtube-dl

Disclaimer: I have no knowledge of the copyright status (public domain or otherwise) of individual movies uploaded to YouTube. Use good judgement.

Did you know that people apparently upload entire movies to YouTube? In fact, there’s a subreddit dedicated to finding these movies. I’d rather have the movie files themselves rather than be reliant on a browser or app to watch them, so I wrote some quick PowerShell code to get these files.

Before we break this down – if you’ve come here thinking about downloading an entire YouTube channel, or an entire playlist, you’d be better served by reading youtube-dl’s native functionality. It is a very powerful app and you can probably do what you want with just the correct parameters, and not involve PowerShell at all.

There are a couple pieces here – one, we’ve got a  list of these movies at /r/fullmoviesonyoutube/,  and we need to scrape the YouTube links. Once we have those links, we need to download the movie files, which is where youtube-dl comes in. youtube-dl is a really powerful command line executable that downloads YouTube video files. Get the links, pipe them to youtube-dl, boom, lots of movies.

If you’re new to web scraping, PowerShell’s invoke-webrequest is a great place to start. Below, we’re using it to extract all the links from the starting page (the subreddit home), then checking to see if there is a “Next” button on the page. If there is, we need to navigate to the next page and extract those links as well. We need to continue that process until there is no “Next” button – meaning the end has been reached.

$youtubelinks = @()
#setting $nextbutton to a non-null value, which is what our loop is going to check for pagination.
$nextbutton = $true
#setting where we're going to start looking for youtube links.
$url = "https://www.reddit.com/r/fullmoviesonyoutube/"

#So each time at the end of the loop, we're going to check if there is a link with the text "next >"
#if there is such a link, we're going to invoke-webrequest the href of that link, and do it all again.
#When there are no more links, $nextbutton will return $null, and the loop will end.
while ($nextbutton -ne $null)
    {
    $alllinks = (Invoke-WebRequest $url).links
    $youtubelinks += $alllinks | where-object {$_.class.contains("title may-blank outbound")} 
    $nextbutton = $alllinks | where-object {$_.innertext.contains("next ›")}
    $url = $nextbutton.href
    }
#We're piping the results of our scraping to a text file.  
$youtubelinks.href | Out-File youtubemovies.txt 

There is an error that triggers each time I search for the “next” link. It still works, so I guess I don’t care for now. This code doesn’t check if the videos exist, or confirm anything about them. It just sends any link with a css class that contains “title may-blank outbound” (these are specifically the reddit item links) to the $youtubelinks object. youtube-dl can manage everything else. By writing the links to a file, we can use it to test the second part. If you’d just like to jump into downloading, the text file I scraped can be downloaded here.

You need to install youtube-dl, and the best way to do it is to install Chocolatey. Start a PowerShell session (“Run as Administrator”) and run the following command:

iwr https://chocolatey.org/install.ps1 -UseBasicParsing | iex

After it completes (and remember, it won’t work if you don’t run PowerShell as Administrator), run the following code:

choco install youtube-dl

That will do it – if you’re familiar with Debian-derivatives, chocolatey is just like apt-get. Here’s what the second part of the script should look like:

#loading the file into $youtubemovies
$youtubemovies = Get-Content youtubemovies.txt

#For loop to send each line in the file to youtube-dl
ForEach($youtubemovie in $youtubemovies)
    {
    youtube-dl -o 'E:/Youtube/YouTubemovies/%(title)s.%(ext)s' $youtubemovie
    }    

In the second part here, we’re using the default youtube-dl settings and only specifying where we want the file to be saved. If you do not give a path, it’ll use the current working directory (which you probably don’t want). I’ve got an external drive (E:) so that’s what I’m using here. You’ll also notice that I’m using some wildcards for the naming – you could choose to get more descriptive. The important part to notice is that we’re giving youtube-dl the next url on each subsequent loop. It’ll do the rest.

I left the second portion of our script running for the better part of a day and it was already over 250GB of files, so be warned if you’re on a metered connection.

Update: It’s finished – 556 movies (well, files anyway), 273GB total.

Leave a Reply

Your email address will not be published. Required fields are marked *