How to Do Your Own Content Gap Analysis
In this guide, I’d like to share a small internal project that initially aimed to solve a specific in-house problem, but ultimately, I believe, could be useful to a wider audience. I’m an SEO specialist at HasData, where we develop various web scraping tools, including a SERP API and a Web Scraping API. Let’s face it − working at a company that provides tools for extracting web data but not using them yourself is a bit like being a professional chef who lives off nothing but crackers.
For those who want to dive right in or prefer to explore tools hands-on, here’s the link to our application − Page Content Gap Analyzer. This tool will allow you to easily analyze your content against your competitors.
One of the tasks I aimed to automate was analyzing our own articles and comparing their content against competitors’ pages in the search results. The objective is to identify entities and concepts we may have overlooked but that consistently appear among the top 10 Google results for our target queries. This analysis helps refine our semantic framework, expand our content coverage, and ultimately enhance visibility and appeal. In the following sections, I’ll detail how this mini-project was developed and the insights we gained from it.
Web Scraping API allows you to scrape web pages without the hassle of managing proxies, headless browsers, and captchas. Simply send the URL and get the HTML response in return.
Get real-time access to Google search results, structured data, and more with our powerful SERP API. Streamline your development process with easy integration of our API. Start your free trial now!
What Is Google NLP and What Are Entities?
Before we get into the technical details, let’s briefly review these foundational concepts.
Google NLP (Natural Language Processing) is a service provided by Google that leverages machine learning and text analysis to understand and break down written content. It can identify various characteristics − such as tone, syntax, categories, and, most importantly for our purposes, entities. In this context, entities include proper names, brands, organizations, products, places, and other significant elements mentioned in the text. Beyond simply extracting these entities, Google NLP also attempts to determine their type, relevance, and how they relate to the surrounding content.
Entities play a crucial role in SEO content analysis. They effectively highlight the core topics, individuals, or objects that your text emphasizes. By comparing your content to that of your competitors and identifying the key entities they feature, you can develop a more comprehensive list of aspects to address. This approach enhances the usefulness, informativeness, and relevance of your material to users.
In our mini-project, we use Google NLP to extract entities from the content automatically. Then we compare which entities are mentioned by competitors but not covered in our own text. This approach allows us to swiftly identify “gaps” − topics that should be addressed to make our content more complete.
How to Use It
Now, let’s transition to the practical aspect. The interface is intuitive, making it easy to get started even if you’ve never used similar tools before.
Getting Started
- Enter your page URL.
Input the address of the page you want to analyze. This could be your article, a landing page, or any other content whose quality you want to improve. - Enter the main search query.
Input the keyword or phrase that you want to rank in the top 10 Google results for. The application will analyze competitors’ content based on this query. - Insert your API keys.
You will need two keys − one from HasData and one from Google NLP. Instructions on how to obtain them are provided below. - Click “Start Analysis” and relax.
Click “Start Analysis” and let the application automatically perform all necessary steps, from scraping search results to extracting entities with Google NLP and conducting a comparative analysis.
How to Get Your HasData API Key
After registering for an account on HasData, you’ll be directed to your Dashboard where you can obtain your API key. Upon activation, you receive 1,000 free API credits, this is enough to analyze approximately 100 keywords.
How to Get Your Google NLP API Key
- Go to Google Cloud Platform and sign in to your account.
- Create a new project or select an existing one from the dashboard.
- Navigate to “APIs & Services” and click on “Enable APIs and Services”. Search for Google Cloud Natural Language API and enable it.
- Go to the “Credentials” section and click on “Create Credentials” > “API Key”. If you already have an API key, you can use that instead.
Regarding Limits:
Google NLP pricing is based on the number of Unicode characters processed per request. For detailed pricing information, refer to the Google Cloud Natural Language API Pricing page. Google NLP offers 5,000 free units per month, which is typically enough for initial testing and small projects. If your usage exceeds this limit, you can scale up by moving to a paid plan.
How Our Page Content Gap Analyzer Works
Now that you understand the purpose of our mini-tool let’s explore its mechanics. I’ll detail how the application processes your input and generates valuable SEO insights.
Step 1: Fetching Search Results
We begin by sending a search query through HasData’s SERP API. The app retrieves the top 10 results from Google and displays them in a table with columns for:
- Position: The page’s rank in the search results.
- Source: The source or domain.
- Link: The direct link to the page.
- Snippet: A brief description of the page’s content.
Step 2: Harvesting Page Content
Next, the app extracts the content from each of those top 10 pages. Using HasData’s Web Scraping API, it automatically collects the main text from every page. This results in a table with columns for:
- Position: The page’s rank.
- Link: The direct link to the page.
- Content: The page’s text content.
Step 3: Analyzing Content with Google NLP
The collected text from each of the top 10 results is then fed into Google’s Natural Language Processing (NLP) API. This service processes the text and extracts key entities, determining their importance and role within the content. You’ll end up with a table showing:
- Entity: The name of the identified entity.
- Salience: The significance of the entity within the text.
Step 4: Processing Your Target Page
We analyze the content of your target page using Google NLP. This data is then compared to your competitors’ entities to identify potential gaps.
Step 5: Analyzing Entities and Identifying Content Gaps
Finally, we compare the top 30 entities from your page to those of your competitors. The results are presented in a table showing:
- Entity: The name of the entity.
- Count: The number of competitors using that entity.
- URLs: A list of links to competitor pages where the entity is mentioned.
Entities present in your competitors’ content but missing from yours are highlighted in orange. This visual cue makes it easy to spot content gaps and take actionable steps to enhance your content accordingly.
Bonus: DIY Content Analysis Scripts
For those who have programming skills or enjoy customizing their processes, I’ve prepared a couple of useful scripts. These scripts were the foundation for creating our analyzer and can assist you in conducting your own SEO analyses.
Python Script for Google Colab
This script extracts content from a list of URLs, analyzes entities using Google NLP, and saves the results to a Google Sheet. Unlike our mini-service, it accepts a list of URLs as input, allowing you to analyze more than just the top 10 results but any reasonable number of pages. This is particularly useful for deeper competitor analysis or investigating specific market segments.
🔗 Access the Google Colab Script
JavaScript Script for Google Sheets
This JavaScript script is designed for Google Sheets and compares the entities of your target page with those of your competitors. The results are automatically output to a new sheet, simplifying the comparative analysis process.
Before running the script, make sure to specify your own domain within the script, as it searches for the column containing your domain to locate your entities.
function onOpen() {
const ui = SpreadsheetApp.getUi();
ui.createMenu('Custom Scripts')
.addItem('Compare Entities', 'compareEntities')
.addToUi();
}
function compareEntities() {
const sheet = SpreadsheetApp.getActiveSpreadsheet().getActiveSheet();
const data = sheet.getDataRange().getValues(); // Get all data
const userDomain = 'yourdomain.com'; // Replace with your own domain
let yourEntities = {}; // Object for your entities
let competitorsColumns = []; // Array for competitor columns
let entityUrls = {}; // Object to store URLs where each entity appears
let highlightedEntities = {}; // Object for entities highlighted in orange
// Step 1: Find the column with the user's domain and extract top-30 entities
let yourUrlColumn = -1;
for (let i = 0; i < data[0].length; i++) {
if (data[0][i].includes(userDomain)) {
yourUrlColumn = i;
break;
}
}
if (yourUrlColumn === -1) {
SpreadsheetApp.getUi().alert(`URL with "${userDomain}" not found.`);
return;
}
// Extract your entities
let startRow = -1;
for (let i = 0; i < data.length; i++) {
if (data[i][yourUrlColumn] === 'Entity' && data[i][yourUrlColumn + 1] === 'Salience') {
startRow = i + 1; // Start from the first data row after headers
break;
}
}
if (startRow === -1) {
SpreadsheetApp.getUi().alert('Entity and Salience not found.');
return;
}
// Extract top-30 entities for your URL
for (let i = startRow; i < startRow + 30; i++) { // Top-30 entities
const entity = data[i][yourUrlColumn];
const salience = data[i][yourUrlColumn + 1];
if (entity && salience) {
yourEntities[entity] = salience; // Save entities with their importance
// Remember where this entity appears
if (!entityUrls[entity]) {
entityUrls[entity] = [userDomain];
} else {
entityUrls[entity].push(userDomain);
}
}
}
// Step 2: Find all columns that contain http, excluding the user's domain
for (let i = 0; i < data[0].length; i++) {
if (data[0][i].includes('http') && !data[0][i].includes(userDomain)) {
competitorsColumns.push(i); // Save competitor column indices
}
}
// Step 3: Process competitors by iterating through each column
competitorsColumns.forEach((competitorColumn) => {
let competitorEntities = [];
// Find the row with "Entity" and "Salience" for the competitor
let foundEntities = false;
for (let row = 0; row < data.length; row++) {
if (data[row][competitorColumn] === 'Entity' && data[row][competitorColumn + 1] === 'Salience') {
foundEntities = true;
// After the "Entity" and "Salience" row, competitor data begins
for (let i = row + 1; i < row + 31; i++) { // Top-30 entities for the competitor
const entity = data[i][competitorColumn];
const salience = data[i][competitorColumn + 1];
if (entity && salience) {
competitorEntities.push({
entity: entity, // Exact match without normalization
salience: salience
});
// Add competitor URL to the list
if (!entityUrls[entity]) {
entityUrls[entity] = [data[0][competitorColumn]]; // Add competitor URL
} else {
entityUrls[entity].push(data[0][competitorColumn]);
}
// If the entity is not in your list, highlight it in orange
if (!yourEntities.hasOwnProperty(entity)) {
if (!highlightedEntities[entity]) {
highlightedEntities[entity] = [];
}
highlightedEntities[entity].push(data[0][competitorColumn]); // Add competitor URL
}
}
}
break;
}
}
});
// Step 4: Highlight entities missing from your list in orange
Object.keys(highlightedEntities).forEach((entity) => {
const entityIndexes = highlightedEntities[entity]; // List of competitors where this entity appears
entityIndexes.forEach((url) => {
// Iterate through all competitor columns and highlight the corresponding cell
for (let i = 0; i < data.length; i++) {
for (let col = 0; col < data[0].length; col++) {
if (data[i][col] === entity && data[0][col].includes(url)) {
const range = sheet.getRange(i + 1, col + 1); // Indexing starts at 1 for rows and columns
range.setBackground("orange"); // Color the cell orange
}
}
}
});
});
// Notify the user that the analysis is complete
// SpreadsheetApp.getUi().alert('Entities comparison completed!');
// Generate a summary
generateSummary(highlightedEntities);
}
function generateSummary(highlightedEntities) {
const ss = SpreadsheetApp.getActiveSpreadsheet();
let summarySheet = ss.getSheetByName('summary');
// If the sheet already exists, clear it
if (summarySheet) {
summarySheet.clear();
} else {
// If the sheet does not exist, create a new one
summarySheet = ss.insertSheet('summary');
}
// Headers for the summary
summarySheet.appendRow(['Entity', 'Count', 'URLs']);
// Populate the summary with only highlighted entities
const entitiesArray = [];
Object.keys(highlightedEntities).forEach((entity) => {
const count = highlightedEntities[entity].length;
const urls = highlightedEntities[entity].join(', '); // Combine all URLs into a comma-separated string
entitiesArray.push([entity, count, urls]);
});
// Sort by Count in descending order
entitiesArray.sort((a, b) => b[1] - a[1]);
// Write the sorted data to the table
entitiesArray.forEach((entityData) => {
summarySheet.appendRow(entityData);
});
// Sort the table in descending order by the "Count" column
const range = summarySheet.getDataRange();
range.sort({column: 2, ascending: false}); // Sort by the second column "Count"
}
These scripts provide basic functionality for content analysis and can be adapted to meet your specific needs. If you’re interested in more advanced features or want to integrate these tools into your workflows, they serve as an excellent starting point.
Conclusion
The SEO tools market is saturated with comprehensive solutions, each offering unique features and capabilities. Our analyzer doesn’t claim to be a universal tool. Instead, it serves as one of many instruments that provide a straightforward and accessible way to identify content gaps. By incorporating it into your toolkit alongside other SEO solutions, you can adopt a more holistic approach to analyzing and enhancing your website, making it more competitive and appealing to users.
I hope this tool proves to be a valuable addition to your SEO workflows and becomes a regular part of your optimization toolkit. If you have any questions or ideas for expanding its functionality, feel free to connect with me on LinkedIn and reach out directly. I’d love to hear your feedback and discuss ways to enhance the tool further. Happy optimizing!
Might Be Interesting
Dec 6, 2024
XPath vs CSS Selectors: Pick Your Best Tool
Explore the key differences between CSS selectors and XPath, comparing their advantages, limitations, and use cases. Learn about performance, syntax, flexibility, and how to test and build selectors for web development.
- Basics
- Use Cases
Sep 20, 2024
Easy Way to Get an Up-to-Date List of Retail Clothing Stores
Learn the easiest ways to get an up-to-date list of retail clothing stores, including methods like no-code scraping, using Google Maps, and exploring alternative sources for accurate retail data.
- Use Cases
- E-commerce
Sep 16, 2024
How to Easily Copy Data from Any Shopify Store to Your Own
Learn how to transfer data from any Shopify store to your own easily. This guide covers everything from understanding Shopify data and its formats to exporting and importing data using simple tools, including a no-code scraper.
- Use Cases
- E-commerce