Node Unblocker for Web Scraping

Valentina Skakun Valentina Skakun
Last update: 30 Apr 2024

Node Unblocker is a Node.js library that allows you to create proxy servers to bypass content blocks. The library was originally developed as a censorship circumvention tool, but it has since evolved into a more general-purpose proxy server. This makes it a great choice for a variety of tasks, such as accessing content from other regions or protecting privacy.

In this article, we will cover the basics of Node Unblocker, including its features, usage, and limitations. We will also review an example of creating a simple application using this library.

What is Node Unblocker

As we mentioned earlier, Node Unblocker is a Node.js library that was created for traffic proxying. However, its functionality has been expanded over time. Today, it can be used to set up a full-fledged proxy server. This server can transmit, receive, and even modify traffic.

In other words, Node Unblocker servers provide a complete mechanism for proxying requests to remote resources. It also supports middleware functionality. This library can not only accept HTTP requests from a client and forward them to the requested resources, but it can also make changes to the contents of remote web pages.

Reasons to Use Node Unblocker Proxy Network

The use of Node Unblocker can be motivated by a variety of factors, depending on the specific needs of the user or developer. For example, it can be used to proxy traffic or modify the contents of transmitted data. This makes it a valuable tool for a variety of purposes, including page optimization, script insertion, and content modification.

Different use cases for Node Unblocker

Illustrative Mind Map of Node Unblocker Use Cases: From Web Scraping to Integration with Other Tools.

Enhancing Security on Public Wi-Fi

Node Unblocker is a tool that can be used to increase user anonymity and easily access blocked websites. It works by routing traffic through a proxy server, which hides the user’s real IP address and allows them to bypass network restrictions.

In public places, Node Unblocker can be especially useful for protecting privacy and accessing information that is otherwise blocked. However, it is essential to be aware of the potential impact on network performance. Using a web proxy server can increase network load, which can lead to slower speeds and reduced quality for all users on the network.

Accessing Content from Any Location

Another reason to create your proxy server using the Node Unblocker library is to access information and resources that are not available in your region. You can use this library to create a simple application and run it on a remote server that does not have these restrictions. So you can easily access all the necessary information.

Concealing Data from ISPs

Lastly, you may want to bypass restrictions imposed by your internet service provider (ISP). This can be useful if you want to hide your online activity from your ISP or if you want to access websites that your provider blocks.  Using Node Unblocker will help ensure the safe transmission of data between you and the target resource, hiding the client’s real IP address and making it difficult to track and monitor his activity.

As in the case of bypassing regional restrictions, your proxy server will help you get data that the provider has blocked. In this case, the proxy server acts as an intermediary. That is, you request data from the proxy server, to which the provider does not restrict access, and the proxy server, in turn, requests the information you need from the target resource and then returns its response to you.

Despite the potential benefits, it is worth noting that using proxy servers to encrypt data can be accompanied by some drawbacks. For example, this can affect the speed of data transmission due to additional encryption and proxying stages.

In addition, the security of transmitted data depends on the correct configuration of the proxy server and the use of reliable encryption methods. Incorrect configuration can leave data vulnerable to attacks.

Node Unblocker for Web Scraping: Step-by-Step Guide

First, we’ll create a basic application and start a server on port 3000. Then, we’ll discuss how to use the Unblocker library to create middleware. We’ll also cover deployment and the next steps. You can also visit the Unblocker library’s official documentation on GitHub to see various examples.

Prerequisites

Assuming that you already have basic NodeJS skills, we will not go into the installation and preparation of the environment. If you are a beginner, you can read our other article, where you will find not only instructions on installing and preparing the environment but also essential skills in working with popular libraries such as Axios and Cheerio.

Now, let’s move on to creating our basic application using Node Unblocker libraries. First, let’s install all the npm packages that will be used later.

npm install express unblocker

Then, create an index.js file, where we will write all the subsequent code. To run the file, use the command:

node index.js

Now, let’s move on to the index.js file.

Creating the Base Application

To start, let’s import the libraries we installed earlier:

import express from 'express';
import Unblocker from 'unblocker';

Then, we’ll create an instance of the Express server, which is the foundation of our web application. Through it, we can configure routes and middleware and process requests.

const app = express();

Now, we can create an Unblocker instance and set the prefix for using Node Unblocker proxies:

const unblocker = new Unblocker({
    prefix: '/proxy/'
});

It is important to place unblocker as one of the first app.use() calls:

app.use(unblocker);

Now, we can configure how to handle different requests. For example, let’s handle requests to the root path (/) of our application, which is at the URL http://localhost:3000/. When a user requests this path, we’ll send the message “Welcome to the main page!”

app.get('/', (req, res) => {
    res.send('Welcome to the main page!');
});

To complete the setup, we need to configure the server to run on port 3000.

const PORT = process.env.PORT || 3000;
app.listen(PORT, () => {
    console.log(`Server started on port ${PORT}`);
});

Let’s run and test it:

This is an image of a server running on port 3000

Node Unblocker server started

To use proxying, you can use the prefix /proxy/. For example, http://localhost:3000/proxy/http://example.com/ will proxy requests to http://example.com/ through the proxy server at http://localhost:3000/.

Using Middlewares

Node Unblocker allows you to create and use custom middleware to process requests and responses. For example, you can add middleware for validating requests, modifying response content, handling cookies, correcting URLs, and other tasks.

Let’s consider creating and using custom middleware to validate requests. For example, we will check if the request URL is a Google website. If not, the middleware will send the client a response with a 403 (Forbidden) status code and an error message.

To do this, before creating an instance of Unblocker, create its configuration parameters. Create a function called validateRequest that will perform the validation of the request URL:

function validateRequest(data) {
    if (!data.url.match(/^https?:\/\/google.com\//)) {
        data.clientResponse.status(403).send('Access denied.');
    }
}

Then, add the middleware to the Unblocker configuration:

const config = {
    requestMiddleware: [
        validateRequest
    ]
};

Finally, use the created configuration when creating an instance of Unblocker:

const unblocker = new Unblocker(config);

Using the same principle, you can create other middleware, such as for redirecting requests:

function redirect(data) {
    data.clientResponse.redirect('https://www.example.com');
  }

Or for modifying request headers:

function modifyHeaders(data) {
    data.clientRequest.headers['X-Custom-Header'] = 'Custom Value';
}

These examples only scratch the surface of the capabilities of middleware. You can create more complex middleware that performs various manipulations with requests and responses as needed by your application. It is important to remember that middleware is executed in the order it is added, so the order of middleware addition can affect the result of request processing.

Deployment to Heroku and Next Steps

Heroku is a cloud platform that provides hosting and deployment services for web applications. It is often used to develop, test, and deploy applications without the need to manage physical infrastructure.

Heroku automatically scales applications based on load, so you can efficiently handle changing traffic volumes. A significant advantage is that Heroku supports a wide range of programming languages, including Node.js (JavaScript), Python, Ruby, Java, Go, Clojure, and PHP.

Heroku previously offered a free tier for small projects, allowing developers to launch their applications without cost in the early stages. However, as of today, Heroku only offers paid plans. The most basic plan costs $5 for 1,000 hours per month. Heroku resources are billed in seconds, so you only pay for the resources you use.

To deploy your Heroku app, you will need the Heroku CLI, which you can download from the official website. The website also provides documentation and instructions on how to use the CLI. Once you have made a Heroku login and configured your app, you can add your application. Then, Heroku will provide you with a random subdomain that you can use to access your application.

You may also need to configure your Unblocker to work with the Heroku environment, such as using environment variables to set the port and other configuration parameters that may differ from your local development environment.

In the future, you can also configure monitoring for your Unblocker using Heroku tools or other monitoring tools. This will allow you to track performance, identify issues, and take action to resolve them.

Limitations of Node Unblocker

Despite the huge number of features supported by the Node Unblocker library, it has a number of drawbacks and limitations that should be considered when using it. The main drawback is the inability to use postMessage requests, the lack of ability to configure a proxy pool, and problems supporting overloaded websites.

On the other hand, if you are using Node Unblocker to bypass restrictions during scraping, you can use other, more suitable solutions. For example, web scraping APIs can be used to scrape data from absolutely any resource completely safely. In the case of their use, the API provider takes on all the problems of scraping, including the use of proxies, bypassing CAPTCHAs and blocks, JS rendering, and more.

Oauth Issues and Inability to handle postMessage requests

Node Unblocker does not support postMessage requests. This is important for websites using technologies such as OAuth, Google, and Facebook, which may rely on postMessage-based interaction mechanisms.

OAuth is a protocol for authorizing access to third-party web resources. As we mentioned earlier, Node Unblocker does not support OAuth login forms. So, it can cause problems when working with resources that require OAuth authentication.

Ability to Work on Complex Sites

While Node Unblocker works well for basic websites, it may have limitations when used with more complex web resources. For example, websites that use advanced technologies such as JavaScript frameworks, AJAX requests, and dynamic content creation may not be fully supported.

As stated on the official GitHub page for the Node Unblocker project, popular but complex websites like Discord, Twitter, or YouTube may not work correctly. However, they also provide an example that detects YouTube video pages and replaces them with a custom page that simply serves the video.

Cloudflare Detection

Cloudflare is a popular web security and performance enhancement service. Some websites use Cloudflare to detect and block automated requests or proxy servers. Node Unblocker may have difficulty bypassing Cloudflare-protected sites, as it is not explicitly designed to avoid such security mechanisms.

Ability to Configure a Proxy Pool

Node Unblocker is primarily designed to circumvent internet censorship and help make a proxy server. Therefore, it does not support the use of proxy pools or proxy rotation.

Proxy rotation can be a complex task, especially if you need to support a variety of proxy servers and handle network issues. If your primary goal is web scraping, we recommend using web scraping API. It will handle the challenges of proxy rotation and bypassing blocks so you can focus on parsing and analyzing data.

Conclusion and Takeaways

In this article, we provided a comprehensive overview of Node Unblocker, a Node.js library for bypassing content blocking, accessing any websites, and improving scraping efficiency. We covered the basics of the library, including its installation, usage, and limitations.

Additionally, we discussed a practical example of creating a basic application, how to configure and run it, and how to add additional configuration to create custom middleware. We provided several examples of such configurations and code that you can modify and use for your purposes.

Finally, we discussed where and how you can deploy your project for cloud-based operation and the challenges you may face in developing your own Node Unblocker.

Blog

Might Be Interesting