In the concluding part of a two-part series on protecting your images, we explain how to stop hotlinking, disabling right-click downloading, and making images invisible to website-scraping bots.
In the first part of this guide, we looked at how to protect images on social media, as well as watermarking, guarding against screenshot attempts, and adding copyright information.
In this concluding article, we look at some of the more advanced measures that are used for protecting images. We also look at their main shortcomings, which you should consider if attempting to manually implement them.
Hotlinking is a problem that dates back to the beginning of the internet, when hosting and bandwidth were expensive.
Instead of downloading the image, website owners would display it on their website by linking to its original location. As a result, the image would load from another server, which would have the effect of using bandwidth and storage provided by the image owner.
Such hotlinking is often done by bots that automatically create websites using content aggregated from other websites. The reason? By scraping this content, the malicious user can either make money from banner ads displayed alongside, or claim the content as their own.
Not all hotlinking is bad, of course. The best example of legitimate hotlinking is Google Images, something many of us rely on without really considering how it works. When your images are scanned by Google Images, Google caches a small thumbnail that can be displayed in search results. Once the thumbnail is clicked, however, the magnified image no longer comes from Google, but from the image owner’s website.
Hotlinking can be prevented with the ‘.htaccess’ configuration file, although SmartFrame users can also block and control this through the SmartFrame Admin Panel, and customize the way such a thumbnail is displayed in search results here too.
Robots meta directives
A common way to instruct web crawlers is by using a ‘robots.txt’ or meta robots tag in the header of a webpage. This can contain directives for web bots that tells them whether to index the website or not.
There are a dozen different settings but the most commonly used ones are ‘noindex’ and ‘nofollow’.
One thing that you should bear in mind is that this is just a polite request, not protection. Legitimate search engines will always honor it, while others will ignore it and still crawl the website to scrape the content, if this is their intention.
Programmatic content scraping
This is almost always a malicious activity, which involves downloading the entirety of a website’s content, usually with the goal of cloning the site.
There are a number of reasons why someone may want to do this. Much as when using hotlinking to do the same thing, cloning a website can be used to generatefake traffic and banner-ad revenue, or to sell counterfeit products. It can also be used for phishing, whereby sensitive details – including debit and credit card details – are obtained from unsuspecting shoppers.
Detecting such activity is difficult, especially when you consider that this can be executed on a computer – rather than a server – impersonating a human user.
So-called lazy loading – or dynamically generated – pages make it harder for bots to find images, as certain manual interaction is required to display images on the webpage. It’s clearly more difficult for an internet bot to generate such an interaction.
Unfortunately, there is another method, and it doesn’t require much technical knowledge or effort. Legitimate browser-plugin marketplaces contain free programs that allow for all images on a webpage to be downloaded. These may even follow links on the page in an attempt to download the whole website.
This activity is best avoided by an appropriate server and website configuration, but as long as an image is displayed on a public website, the image source file has to be available to the public in one way or another.
Obfuscating the source file
One way around this is to hide the image source file from this code so that it cannot be scraped by a bot (or manually stolen by a human user).
We’ve seen several creative ways to hide the image source file while displaying the image on a web page. It has been known, for example, for webmasters to program a website in such a way that the image address can only be directly accessed by the website domain.
Some images only load when certain conditions are met, such as when a user presses an arrow button in a gallery, or scrolls down the page. Although links to images loaded into the page as a result of such events may not be included in the source code, and may be harder for a bot to capture, such front-end events are becoming more standardized and easier to predict – especially in the current environment, where most websites are based on a few major frameworks and libraries.
We’ve also seen some extreme measures here, such as where a webmaster deliberately introduces an error and renames the files with an incorrect extension in order to pretend these aren’t actually JPEGs. Most browsers’ error-handling algorithm will correct this issue, but such a solution can have catastrophic results, as some browsers are not able to correct this mistake and will display a blank image instead.
It is, however, relatively easy to circumvent this protection by looking into the source file of the webpage, using common developer tools that are built into almost every browser.
Even if the webmaster disabled direct access to the image address, every image displayed in a browser can be found in the downloaded resources.
Common underlying problem
While helpful to a certain extent, none of these methods resolve the underlying problem, which is that, sooner or later, the image will have to be presented to the user, and transmitted to – and cached by – the user’s browser. The image is almost always present in the source code and relatively easy to access.
This is where a solution such as SmartFrame comes in, its robustness owed to a completely different approach when serving images.
Rather than downloading an image file to the user’s browser, a request to display an image is sent to the cloud. Serving the image data is subject to meeting minimum security conditions. Once a handshake between the website requesting the file and the cloud is established, the image can be transmitted through this channel – and only this channel. In other words, the image is transmitted and rendered – quite literally, pixel by pixel – on the authorized webpage.