At this point, just about everyone has used a search engine like Google, Bing, Yahoo, or DuckDuckGo to look something up. Though there is a lot more science and sophistication behind how a search engine works that can make all the difference between whether your website ranks well, or gets lost deep down the page. The explanation can be quite in depth leading into a weekend long conversation if we really dig into every nook and cranny of it all. Therefore we’ll try and stick to just the essentials.

Search engines essentially have three primary functions. Taking each of them into account will go a long way toward developing and implementing an effective SEO strategy.

Crawling

This is a critical component that helps make search engines so successful at finding the content you or your potential customers want. When a search engine “Crawls” it is essentially scouring the Internet for content, which includes examining the code for each URL that it finds.

Indexing

This is a function of a search engine’s database that basically stores the content and code it discovered when crawling. It is then organized into an index that is fast, easy to use, and relevant to all applicable queries.

Ranking

This is a cumulative process where key content components are used to display the best possible answer to a searcher’s query. They are then ordered and displayed to the user with the most relevant first to least relevant farther down the page or on a second page.

How Does A Search Engine Crawl?

Crawling is essentially a search engine’s “Discovery Process.” This is where Google and other popular search engines deploy a special team of “Bots” which are sometimes referred to as “Crawlers” or “Spiders” to search out new and updated content. This could be a webpage, an image, a new video, or even something static like an updated PDF. These Bots navigate their way through the content via links.

In one example of this, a Googlebot begins the crawling process by fetching a few web pages to be analyzed. The Bot then follows the links on those webpages to discover any new URLs. This distinctive method of using links to navigate give the crawler the ability to find new content that can then be indexed into Google’s massive database of discovered URLs.

How Is Search Engine Ranking Determined?

Search engine ranking is a process where a proprietary algorithm uses the indexed information collected by the crawling of the bots and applies priorities to ordering it by relevance. This relevance in the index is then compared to the query entered into the search engine by the user. The more relevant a link or page is the higher it is ranked.

Though there is a lot more that goes into the values assigned by each search engine’s algorithm. At the same time, algorithms keep evolving on a daily basis to find increasingly high-quality relative content. Google has claimed their updates occur in perpetuity, with really large updates that occur a few times a year that turns search engine optimizers on their heads with panic, fear and joy for a few weeks while the changes roll out. These changes can bring some websites to the surface of a results page, while others can cause catastrophic scenarios that essentially wipe a website off the face of the earth.

Is It Possible To Block A Bot Or Crawler From Certain Parts Of Your Site?

Theoretically, it is possible to block a crawler from examining a certain part of your site. This essentially prevents the search engine from storing or indexing certain pages. Though this is something reserved for sensitive information or special services that you only offer to a select few potential customers. In general, you want to make your site’s content as easy to access and crawl as possible, if you want it to rank well. Aside from adding password protected areas of the site, there are small files that live on the server called robot.txt files. This is a directive that tells bots about to crawl the site that they can or cannot crawl. While bots are pretty good at respecting this directive, it’s completely voluntary.

Is There A Way To Tell If A Search Engine Has Crawled My Site?

Fortunately, it’s relatively easy to determine if a particular search engine has crawled and indexed your site or a specific page. Using Google as an example, all you have to do is go to Google’s search engine’s advanced search operator and enter “site:yourdomain.com” the results displayed will tell you everything Google has in its index for your domain.

If you want more in-depth results you can go to Google’s Search Console and use the Index Coverage report. This will give you highly accurate results. Best of all Google’s Search Console is currently free.

Lastly if you have access to your servers log files, there is actually quite a bit of detailed information there about all activity that occurs on your sites files.

What If My Site Doesn’t Show Up In Google’s Search Console Or Domain Search?

There are a few possible reasons why your site might not be showing up as being indexed by Google in Google’s Search Console. This could include things like:

  • You simply have a new site, which hasn’t been crawled yet by that specific search engine. It can take a up to 8 weeks typically for a brand new site
  • You don’t have any external links to pointing your website.
  • You are using a type of site navigation that makes it difficult for a bot to crawl it efficiently.
  • Your website has been flagged for spamming content in the past and penalized by the search engine.
  • You have chosen to no follow the entire site. In wordpress there is an option to discourage bots from crawling the site. This option is checked more often, left by accident when sites are being built.
  • Check your robots.txt file or .htaccess file on the server to see if there are any instructions in there for the bots, telling them to ignore your site.

How Can I Get A Search Engine To Crawl My Website?

Let’s say that you used Google Search Console or the “site:domain.com” advanced search operator and it revealed that your site or most of your important pages haven’t been crawled or indexed. This means that they won’t rank in any Google search, which can be frustrating.

In a time like this, you might want to try improving your search engine optimization techniques on the pages you want Google to index. This starts with removing things like old URLs that have thin content, as well as duplicate URLs, special promo code pages, test pages, and less relevant content with proper H1, H2, H3 tags for the hierarchy of information on your site. The lower the number of a H tag, the more important it is, typically. Then update the content on all the pages and make sure your page title is relevant. Make sure to also include a few internal links, as Bots use internal links to navigate during the crawling process. These things will hopefully demonstrate activity and encourage the bot to crawl those pages in future cycles.

Assuming you have Google search console setup, it’s a great idea to submit your website’s sitemap. While Google doesn’t necessarily need this, the sitemap is a file that tells google every single url on your website and how they are interlinked together in a hierarchy.
You can also log into your Google search console to ask for a manual index of a specific url. This typically gets indexed immediately, and has immediate results.

What Is A Crawl Budget?

A crawl budget is the average number of URLs that a Bot will crawl on your site before it leaves. This makes crawl budget optimization extremely important for getting your most relevant pages indexed. It is especially important if you have a very large site with over a thousand or more URLs.

In a case like this, you might want to employ some code tag to block bots from accidentally crawling unimportant pages. Let’s say for example you have an e-commerce site with thousands or perhaps tens of thousands of products. You can insert Robots.txt files in the root directory of your websites to suggest which parts of your website the search engine’s bots should and should not crawl. This can also influence the speed at which they crawl your site.

How Do Search Engine Algorithms Work?

Different search engines use their own in-hour proprietary algorithms to rank indexed information obtained by the crawling Bots. While none of them wants to fully tip their hand, there are a few basic assumptions you can glean from some of the most popular search engines.

Google’s Algorithm

Google is the most popular of all the search engines and account for the lionshare of searches on the internet around the world. Their crawlers seek out new and updated content frequently. Just how often they do this is something Google does not disclose. Though like most Bot, they are certainly using external links as their primary means of navigation. Typically Google will scan sites it deems more important more often such as a news website that makes frequent changes, than it would a site that rarely makes updates. Google loves content, and the more unique, relevant content you can put out there to feed it, the more often Google’s bots will come back to your site to eat more and regurgitate it out to it’s users who are looking for the best content in the shortest amount of time. This is why it’s always an ideal marketing strategy to continually write fresh related content in the form of blog articles as often and frequently as possible. Daily or Weekly at a minimum.

Once the crawled information is indexed, Google’s algorithm then carefully analyzes the website data, focusing on written content, images, and videos, as well as factoring in the technical site structure. Throughout this process, Google’s algorithm looks for positive and negative ranking signals. This includes things like keywords as well as website content freshness to accurately assess what any page is truly about.

Google then assesses five key factors and applies them to how a particular page or site is ranked.

This includes:

The Query Meaning

Based on the intent of any end user’s question which is then parsed using complex language models based on past searches as well as user behavior.

Web Page Relevance

This is the second phase of the process where the user’s search query intent is then used to filter the applicable content of ranking web pages to better determine which one is the most relevant. This is driven primarily by keyword analysis. This relevance could also be related to the proximity of the site to a users location. So a site for a company that is located in Nebraska, will be shown higher to a person searching on their phone also in Nebraska.

Content Quality

The keywords are matched and the relevant content is prioritized based on the authority of a given website as well as how fresh and accurate the content is.

Web Page Usability

The algorithm then assigns a ranking priority to websites based on criteria that assume ease or difficulty of use. This can include things like site speed to responsiveness.

Additional Context & Settings

This final step applies a variety of user engagement principles and specific settings that are specific to Google’s platform.

Bing’s Search Algorithm

As Microsoft’s proprietary search engine, Bing is open-source. This essentially means that just about anyone can easily examine the underlying code that guides Bing’s search results. Bing’s code is separated into two separate modules. The Index Builder and the searcher.

The Index Builder

Is basically the code that works to categorize website information into what Bing calls “Vectors.” They are then organized and stored in Bing’s index.

The Searcher

This is the module that Bing uses to make connections between independent search queries and vectors in their index.

DuckDuckGo’s Search Algorithm

DuckDuckGo has started making a strong name for itself via its focus on data privacy. To that end, they specifically do not capture information about their searchers, which is something that Google, Bing, and others most certainly do.

It’s also worth noting that the DuckDuckGo platform has a feature called “Bangs” that essentially allows the user to use custom parameters to completely bypass the search results page. This is done by pulling information from multiple sources to display results. At that point, DuckDuckGo then behaves as a search portal for other platforms like Wikipedia, Amazon, and even Twitter.

YouTube’s Search Algorithm

YouTube is one of, if not the most popular video hosting sites in the world. Since they are technically owned by Google, their search engine functions in much the same way. Though there are some customized features tailored to meet the needs of a video-rich platform. This includes breaking content down by three primary criteria.

Scale

With roughly 1.3 billion users, first parsing content down by scale helps YouTube filter through the most applicable data.

Freshness

They balance the videos and other content they recommend based on how recently a video was uploaded. This is then factored into the individual user’s past behavior.

Noise

YouTube’s algorithm then parses what is the most relevant at any time with the user’s previously established preferences. This further helps filter out less relevant options that would otherwise be ignored.

Conclusion

Keep in mind this is a “dumbed down” simplified approach to giving just a taste of the basics of how search engines work. Every minute detail about search engines can be examined for hours on end, with many conferences around the world lasting days just trying to go over all the details on how to best navigate the complex ranking protocols and coming out on top.

Google itself is in perpetual beta, constantly changing and testing its various ranking factors for how it not only decides to rank one site’s information over another, but more so how that information is displayed to each user depending on the various, query intent phrases one uses, the devices or GPS location that user is searching from.