403 error when downloading .mp4, does not occur locally

Hey all. I’m running/testing an application here Videomatch - a Hugging Face Space by Iskaj, but I run into the issue where there is a discrepancy using the space locally or on Spaces.

Running the code.

url = "https://rr2---sn-5hne6nzk.googlevideo.com/videoplayback?expire=1665421952&ei=IP5DY7mjMYi71wLCu6XgCA&ip=85.145.175.77&id=o-AAvEo6X0IUC5HzIt58mG0v0l8qCrejbMz3jIkBliBY3q&itag=18&source=youtube&requiressl=yes&vprv=1&mime=video%2Fmp4&ns=-ZBIdaQxxI3AHnnxhNz1a60I&gir=yes&clen=382527&ratebypass=yes&dur=9.055&lmt=1660945791563080&fexp=24001373,24007246&c=WEB&txp=4438434&n=eIEl-8KvnxIFRjF&sparams=expire%2Cei%2Cip%2Cid%2Citag%2Csource%2Crequiressl%2Cvprv%2Cmime%2Cns%2Cgir%2Cclen%2Cratebypass%2Cdur%2Clmt&sig=AOq0QJ8wRgIhAO7A4SkbzFwmQ2N6wBKGGfoBPhkX-8hQ0EZXngc-V4-2AiEA1zcKldSWoOvUD5S4VvTxC7kn9Nbo7DX4LiqPO0-RgOI%3D&redirect_counter=1&rm=sn-5hnesd7l&req_id=cb4d5150b17ea3ee&cms_redirect=yes&cmsv=e&ipbypass=yes&mh=6Q&mip=143.178.165.10&mm=31&mn=sn-5hne6nzk&ms=au&mt=1665404014&mv=m&mvi=2&pl=18&lsparams=ipbypass,mh,mip,mm,mn,ms,mv,mvi,pl&lsig=AG3C_xAwRgIhAPY0LyNdQ-k5EjKC2V9m1gRhOhQ6KxCl_9nhJeSNjmgGAiEA5gdstdSbaXsHmgpybV7LM3n6Brke5GH0xtR1FKRbOO4%3D"
req = urllib.request.Request(
            url=url, 
            headers={'User-Agent': 'Mozilla/5.0'}
            )
        with (urllib.request.urlopen(req, timeout=300)) as f, open(filepath, 'wb') as fileout:
            shutil.copyfileobj(f, fileout, length=16*1024)

Gives back a 403 access denied error, but ONLY when running it in HF Spaces, not when running it locally. I already tried stuff like including the header {‘User-Agent’: ‘Mozilla/5.0’}, because I thought it might block scraping or something.

Anyone know why this discrepancy occurs between running locally and running via Spaces?

Hi !

The problem might be because spaces block all network requests that are not made to ports 80, 443, or 8080.

See the networking section here: Spaces Overview

I don’t think that is the case, I just successfully downloaded the file from within a space with

wget "https://rr2---sn-5hne6nzk.googlevideo.com/videoplayback?expire=1665421952&ei=IP5DY7mjMYi71wLCu6XgCA&ip=85.145.175.77&id=o-AAvEo6X0IUC5HzIt58mG0v0l8qCrejbMz3jIkBliBY3q&itag=18&source=youtube&requiressl=yes&vprv=1&mime=video%2Fmp4&ns=-ZBIdaQxxI3AHnnxhNz1a60I&gir=yes&clen=382527&ratebypass=yes&dur=9.055&lmt=1660945791563080&fexp=24001373,24007246&c=WEB&txp=4438434&n=eIEl-8KvnxIFRjF&sparams=expire%2Cei%2Cip%2Cid%2Citag%2Csource%2Crequiressl%2Cvprv%2Cmime%2Cns%2Cgir%2Cclen%2Cratebypass%2Cdur%2Clmt&sig=AOq0QJ8wRgIhAO7A4SkbzFwmQ2N6wBKGGfoBPhkX-8hQ0EZXngc-V4-2AiEA1zcKldSWoOvUD5S4VvTxC7kn9Nbo7DX4LiqPO0-RgOI%3D&redirect_counter=1&rm=sn-5hnesd7l&req_id=cb4d5150b17ea3ee&cms_redirect=yes&cmsv=e&ipbypass=yes&mh=6Q&mip=143.178.165.10&mm=31&mn=sn-5hne6nzk&ms=au&mt=1665404014&mv=m&mvi=2&pl=18&lsparams=ipbypass,mh,mip,mm,mn,ms,mv,mvi,pl&lsig=AG3C_xAwRgIhAPY0LyNdQ-k5EjKC2V9m1gRhOhQ6KxCl_9nhJeSNjmgGAiEA5gdstdSbaXsHmgpybV7LM3n6Brke5GH0xtR1FKRbOO4%3D"

Could you maybe try adding a GET method on your request ?

1 Like

I think it should be a GET already. See urllib.request — Extensible library for opening URLs — Python 3.12.0 documentation

method should be a string that indicates the HTTP request method that will be used (e.g. ‘HEAD’). If provided, its value is stored in the method attribute and is used by get_method(). The default is ‘GET’ if data is None or ‘POST’ otherwise. Subclasses may indicate a different default method by setting the method attribute in the class itself.

Or do you mean actually doing it via wget?

But I did manage to solve it for my usecase by downloading the video (which is a youtube video) directly using pytube instead of via this provided link. Just in case someone out there is trying the same thing and hits this issue.

I tried an reproduce the error from another space and was able to successfully download the file using urllib.

But changing it to

url = "https://rr2---sn-5hne6nzk.googlevideo.com/videoplayback?expire=1665421952&ei=IP5DY7mjMYi71wLCu6XgCA&ip=85.145.175.77&id=o-AAvEo6X0IUC5HzIt58mG0v0l8qCrejbMz3jIkBliBY3q&itag=18&source=youtube&requiressl=yes&vprv=1&mime=video%2Fmp4&ns=-ZBIdaQxxI3AHnnxhNz1a60I&gir=yes&clen=382527&ratebypass=yes&dur=9.055&lmt=1660945791563080&fexp=24001373,24007246&c=WEB&txp=4438434&n=eIEl-8KvnxIFRjF&sparams=expire%2Cei%2Cip%2Cid%2Citag%2Csource%2Crequiressl%2Cvprv%2Cmime%2Cns%2Cgir%2Cclen%2Cratebypass%2Cdur%2Clmt&sig=AOq0QJ8wRgIhAO7A4SkbzFwmQ2N6wBKGGfoBPhkX-8hQ0EZXngc-V4-2AiEA1zcKldSWoOvUD5S4VvTxC7kn9Nbo7DX4LiqPO0-RgOI%3D&redirect_counter=1&rm=sn-5hnesd7l&req_id=cb4d5150b17ea3ee&cms_redirect=yes&cmsv=e&ipbypass=yes&mh=6Q&mip=143.178.165.10&mm=31&mn=sn-5hne6nzk&ms=au&mt=1665404014&mv=m&mvi=2&pl=18&lsparams=ipbypass,mh,mip,mm,mn,ms,mv,mvi,pl&lsig=AG3C_xAwRgIhAPY0LyNdQ-k5EjKC2V9m1gRhOhQ6KxCl_9nhJeSNjmgGAiEA5gdstdSbaXsHmgpybV7LM3n6Brke5GH0xtR1FKRbOO4%3D"
req = urllib.request.Request(
            url=url, 
            headers={'User-Agent': 'Mozilla/5.0'},
            method='GET',
            )
        with (urllib.request.urlopen(req, timeout=300)) as f, open(filepath, 'wb') as fileout:
            shutil.copyfileobj(f, fileout, length=16*1024)

didn’t solve it at least.