WDF*IDF: What can the SEO miracle formula really do?

The struggle to grab the top spot on Google’s (and other search engines) search results is an ongoing battle. It used to be considered almost an SEO sport to use as many keywords as possible in their content, but now the art of search engine optimisation revolves around creating unique texts. Whether at the homepage, subpage, product page or category page of your website: exclusive, relevant content that differs in terms of copywriting and keyword usage from peer-to-peer reviewers is key when it comes to outperforming the competition and being placing first in the results. A term that is increasingly used in this context is the WDF*IDF analysis or formula.

What is WDF*IDF?

WDF*IDF is an analysis method that can be used within the scope of search engine optimisation to determine keywords and terms that sustainably increase the relevance of published texts, and, therefore, the entire website. It’s a formula that multiplies the two values of Within Document Frequency (WDF) and Inverse Document Frequency (IDF). The result is the relative term frequency (also term weighting) of a document, relative to all other web documents which also contain the keyword included in the analysis. Before the WDF*IDF analysis can be run, you first need to determine the two factors mentioned.

How to determine the Within-Document-Frequency (WDF) value

The WDF describes how often a particular term occurs in a document compared to all other terms it contains. To increase the validity of the determined value, the formula is based on a logarithm that prevents the central term from being weighted too heavily. The term was first mentioned in 1992 in the work of Donna Harman, in which her article “Ranking Algorithms” features the term WDF as a way to give words of a particular document a weighting value useful for information science. In Website Optimisation, the WDF value has been used for some time as an alternative to the less flexible keyword density value, which merely reflects the relative abundance of a key term. The formula for determining the Within Document Frequency is:

The individual components of the equation can be explained as follows:

i Term that you are using the Within Document Frequency to determine the frequency of  
j Document to be analyzed  
Lj Total number of words in the "j" document  
Freq(i,j) Frequency of the word"i" in the document "i"  
log2 Logarithm of the number x to the power of 2  

Therefore, the WDF value for a term “I” in the document “j” is determined by adding the frequency of the term and 1 and dividing it by the total number of words in that document. Both values use the logarithm “log2”, which gives you more meaningful results for the term than it does in determining pure keyword density or relative frequency. An example can illustrate this:

An examined term that appears 50 times in a 1,000-word document has a Within Document Frequency of 0.57. The relative frequency, in this case, is 5 percent. If you increase the frequency of the term now for optimization purposes, to say 500, you get a WDF value of 0.9 (rounded) – i.e. a value that is around 1.5 times higher than in the original text. On the other hand, if you choose the relative value (which has now risen to 50 percent) as the basis, you will see an increase 10 times the original value.

How to determine the Inverse Document Frequency (IDF) value

Inverse Document Frequency (IDF) is a value that measures the meaning of a term, not by its frequency in a particular document, rather by its distribution and use throughout the body of the document: the more potential a concept has, the higher the Inverse Document Frequency. The optimal case is that a term is very common in just a few documents. On the other hand, words that appear in almost every document or appear only rarely are of minor importance. For example, the word “imprint” has a very low IDF value because it is used on almost every website.

To calculate the inverse document frequency value, the following formula is needed (it also uses a logarithm to adjust the results):

The different components of the IDF equation can be explained as follows:

i Term, that the Inverse Document Frequency is being determined for  
log Logarithm of the number x to base 10 or to any basis b  
ND Number of documents in the result sets (containing relevant terms)  
fi Number of documents where the term i occurs  

Therefore, to determine the IDF value of a term "i", divide the total number of (relevant) documents contained in the result sets by the number of documents containing the term and then add the number 1. Finally, take the logarithm “log” from the result of that calculation.

How is the number of all relevant documents in the result set calculated?

Adding ND means that the IDF formula cannot be uniformly determined. Instead, it is the result of the frequency of all meaningful words in the examined document, as well as the underlying absolute number of documents. However, when analysing web documents for SEO purposes, the potential results are huge, as all pages indexed by Google (or other search engines) are eligible. Nevertheless, to obtain a specific value, the number of search results of all relevant terms in the document is determined and added. For example, in a highly simplified document that only contains the words “Search Engine Optimisation” (17,300,00 search results, December 2017) and “Web Analytics” (2,200,000 search results, December 2017), has a Nvalue of 19,500,000.

WDF*IDF: The combination of both formulae

Because Within Document Frequency represents the relevance of a term within a particular document and the Inverse Document Frequency can reflect the role of a term relative to all of the search result documents, merging both values provides deep insights into the actual term frequency and potential of the term to optimise existing text content. To this purpose, it is only necessary to multiply both values, which results in the following overall formula for the WDF*IDF analysis and help determine the most exact, usable term frequency:

In principle, it means bringing all the important components together and using them to determine the validity of terms used in webtexts. Of course, the bigger the database, the more meaningful the results are. However, to make the WDF*IDF analysis useful for search engine optimisation, it must be applied to all meaningful words within a document. This would simply be too much effort to do manually, which is why using the WDF*IDF tool is part of any serious repertoire when calculating term weighting. On the one hand, these programs (see below) help to analyse the existing textual material. On the other hand, they also provide clues as to which concepts a document lacks in order to be as unique and relevant as possible.

Conclusion

The frequency of the term "i" in the document "j" can be determined by multiplying the Within Document Frequency of the term "i" in the document "j" by the inverse document frequency of the term "i" throughout the set results.

The benefits of WDF*IDF for Search Engine Optimisation

The advantages of a comprehensive WDF*IDF analysis are obvious: the values obtained for weighting key terms serve as perfect landmarks for writing texts so that:

  • they have high relevance for search engines
  • they cover topics which do not have a lot of competition
  • they do not have any keyword spam
  • and are as unique as possible

Anyone who is dissatisfied with his or her own website rankings or strives for improved optimisation has helpful ally by utilising WDF*IDF values. Based on analysis data, copywriters can create concrete guidelines for revising their content that aren’t just aimed at increasing the keyword density or incorporating other keywords into the text.

Note

For all the usefulness of a thorough WDF*IDF analysis, you should never forget that content is written primarily for readers and not for search engines. In addition, since the former is getting better and better at capturing texts semantically, in the long run, there is simply no way around strong content in which keywords and other technical additions play just a minor part.

What are the weak aspects of WDF*IDF analyses?

Although WDF*IDF provides very valuable input for website optimisation, there are a few issues that should be considered before analysing and evaluating results. For example, a fundamental problem is that a WDF*IDF analysis always includes all the textual elements of a document, whether they are headings, category/product descriptions, or captions. Differentiation of the individual components won’t take place. Even if only one paragraph is too keyword-heavy or contains too few elementary terms, the analysis method won’t provide a satisfactory answer, since the frequency weighting is always evaluated for the entire document.

Tip

Before considering a WDF*IDF analysis for your own website, you should carefully check whether the embedded content is suitable for the term frequency analysis method. In addition, the results obtained should be carefully scrutinised in order to detect potential fallacies (too small a database, for example) that need to be avoided.

Another weakness of the WDF*IDF formula is that it only gets really interesting with a high word count. For shorter passages like product descriptions, smaller blog entries or news articles, the analysis does not provide meaningful, usable results. This is why it’s often unsuitable for certain websites like online stores or news portals. For sites that rely on heavy editorial work, the drawback is that WDF*IDF analysis is difficult to incorporate into the standard workflow. Since fast response times and up-to-datedness are particularly in demand here, optimising the texts after publishing would be a practical, if complex, solution.

An overview of the advantages and disadvantages of the WDF*IDF analysis

Advantages of the WDF*IDF analysis Disadvantages of WDF*IDF analysis  
Provides a great opportunity to expose existing keyword spam Always examines the complete text content of a document  
Makes relevance and uniqueness crucial criteria for frequency weighting in the foreground Provides no information about special paragraphs or passages that are worth optimising  
Rates terms with lower competition better than highly competitive ones Not suitable for short texts with few words  
Unites the disciplines of document specific and cross-disciplinary analysis Hard to integrate into work processes which prioritise timeliness and responsiveness  
Flattens results through logarithms for more meaningful results Precise number of all relevant documents is difficult to determine  

What WDF*IDF tools are there?

There are several tools that can be used to perform a WDF*IDF analysis. It is important to distinguish between applications that are only part of an SEO suite and those that are available as standalone solutions. Depending on the range of functions and the usage options, the individual tools differ in terms of cost. To give a brief overview of the variety of applications, we have compiled some of the best WDF*IDF tools in the following list:

  • OnpageDoc: If you would like to analyse and optimise your websites‘ SEO status, you can use OnpageDoc, the complete package from SAC Solutions GmbH in Cologne, Germany. If you take out a monthly subscription, you’ll have access to a variety of features to review and improve keywords, meta tags, backlinks and more. A WDF*IDF tool for term weighting analysis and targeted competitive comparison is also part of the portfolio. Those who do not want to access the entire suite can also download the tool for free at wdfidf-tool.com. However, the problem is that the number of possible queries is limited to 100 queries per hour (common to all users).
  • SEOlyze: Semantic analysis and research based on the WDF*IDF principle can also be done with the paid content analysis section of SEOlyze. Helminger GmbH, which is based in Austria, focuses on helping clients perfect website content and offers various tools like a W-questions tool for research, a duplicate content checker or readability analyses (Flesch/Wiener factual text formula) to achieve this. The centerpiece, however, is the comprehensive WDF*IDF analysis function, the results of which can be implemented directly into the SEOlyze interface, thanks to the integrated editor. In addition to the WDF*IDF tool, the SEO suite includes various rank-tracking features, as well as several other tools for general on-page optimisation (keyword analysis, metadata checker, images, links, etc.).
     
  • XOVI: XOVI, a subsidiary of Plesk since 2017, provides its customers with a SEO suite that leaves little to be desired. The chargeable XOVI Toolbox, which is available in multiple languages, has three different models on offer (Pro, Business and Enterprise). It also includes tools to keep track of ads, traffic, keywords, backlinks and social signals. The XOVI TextOptimizer also includes a WDF*IDF text tool that not only calculates the relevance of terms used and suggests other terms based on the first ten Google search results pages, but also allows for direct editing.
     
  • Seobility: Seobility offers numerous SEO tools free of charge on their homepage – such as a simple WDF*IDF tool. The web application allows users to parse the weighting of a term based on the WDF*IDF formula. In addition, the tool plays other terms (including frequency value) that match the word you are looking for. Access to the Seobility program is limited to five analyses per day per user. Users who create an account can have access to the advanced search settings in order to, for example, adjust the base of the logarithm, increase the number of considered search results or select the platform (desktop/mobile) to optimise for. 
Was this article helpful?
Page top