Search engine optimization is not only about keywords, links, and content. The technical part of SEO can make the difference for your entire digital marketing campaign.
How and Why to Get Google to Index a Website
2.5 Lecture Highlights and notes
Why? Because you want to get search engines to index and rank website pages ASAP to generate traffic to your site. Without indexing, your site won’t show up in a search query. Organic search traffic is the source of over half of all site traffic, on average (study source link), as compared to 5% from social media – Up to 64% of your traffic (source, conductor.com, 2014), according to Conductor.com.
How do you get your new site or blog indexed by Google, Bing and other search engines?
- Sit back and wait for it to happen naturally.
- Or make it happen now!
Step 1: Understand How Search Engines Work
Indexing is simply the spider’s way of gathering and processing all the data from pages and sites during its crawl around the web. The spider notes new documents and changes, which are then added to the searchable index Google maintains, as long as those pages are quality content and don’t trigger alarm bells by violating Google’s user-oriented mandate. So the spider processes both the content (text) on the page as well as the location on page where search terms are placed. It also analyzes titles tags and alt attributes for images. That’s indexing.
The spider starts with pages that have already been indexed via earlier crawl sessions. Next, it adds in sitemap data. Finally, the spider finds and uses links on pages that it’s crawling and adds those linked-to pages to the list of pages to be crawled.
Step 2: Add a Blog
Blog content gets crawled and indexed more quickly than static pages.
Step 3: Use Robots.txt
What it is:
The robots.txt file is a basic, plain text file, that should reside in the root directory of your domain, that you use to tell search robots, (also known as Web Wanderers, Crawlers, or Spiders), which pages you would like them not to visit. The robots.txt file gives strict instructions to search engine bots about which pages they can crawl and index — and which pages to stay away from.
Why it’s important:
Step 4: Create a Content Strategy
A well-thought-out and written content marketing plan helps you avoid getting tripped up in the mad rush to publish more content.
- What are your goals? Specify SMART goals (Links to an external site.) and how you’ll measure your progress (i.e., metrics).
- Who is your target audience? Customer profiles or personas (Links to an external site.)are essential to understanding your audience and what they want/need.
- What types of content will you produce? Here, too, you want to make sure you’re delivering the content types (Links to an external site.) that your target audience most wants to see.
- Where will it be published? Of course, you’ll be hosting your own content on your new site, but you may also want to reach out to other sites or utilize platforms such as YouTube, LinkedIn and Slideshare.
- How often will you publish your content? It’s far better to produce one well-written, high-quality article a week consistently than to publish every day for a week, then publish nothing for a month.
- What systems will you adopt for publishing your content?
Step 5: Create and Submit a Sitemap
What it is:
The sitemap is a list (in XML format) of all the pages on your site. Its primary function is to let search engines know when something’s changed – either a new page, or changes on a specific page – as well as how often the search engine should check for changes. They will help your site get indexed more quickly.
How to create a sitemap:
If you’re using WordPress: simply install and use the Google XML Sitemaps plugin. Its settings allow you to instruct the plugin on how frequently a sitemap should be created, updated and submitted to search engines. It can also automate the process for you, so that whenever you publish a new page, the sitemap gets updated and submitted automatically.
How to submit your sitemap:
Use Google Webmaster Tools:
- Log in to your Google account
- Add your new site’s URL to Webmaster Tools by clicking the “Add a Property” button on the right.
- In the popup box, enter your new site’s URL and click the “continue” button.
- Follow Google’s instructions to add an HTML file that Google creates for you, link your new site through your Analytics account or choose from another of the options Google will outline.
- Once your site has been added to Google’s Webmaster Tools dashboard, simply click the URL to go to the Dashboard for that site.
- On the left, under “Crawl,” click “Sitemaps” then in the upper right corner click “Add/Test Sitemap.”
Step 6: Install Google Analytics
Installing Google Analytics may may help trigger the crawling and indexing process. It is also an awesome tool for analyzing site data and strategizing your marketing plan.
Step 7: Submit Website URL to Search Engines
To submit your site URL to Google:
- Log in to your Google account and navigate to Submit URL in Webmaster Tools.
- Enter your URL, click the “I’m not a robot” box and then click the “Submit Request” button. (You can do this with Bing as well.
Step 8: Create or Update Social Profiles
Search engines pay attention to social signals which can potentially prompt the search engines to crawl and index your new site. Social signals will help you rank your pages higher in the search results.
Make sure your new or existing social profiles link to your website.
Step 9: Share Your New Website Link
Link to your site or blog through your own social status updates:
- Pinterest – select a high-quality image or screenshot from your new site. Add the URL and an optimized description (i.e., make sure you use appropriate keywords) and pin it to either an existing board or a new one you create for your site.
- YouTube – Record a short screencast video introducing your site and highlighting its features and benefits. Then add the URL in the video description.
- Email – If you have an existing email list from another site that’s related to the same niche as your new site, you can send out an email blast to the entire list introducing your new site and including a link. Add your new URL and site name to your email signature.
Step 10: Set Up Your RSS Feed
RSS generally helps increase readership and conversion rate but it can also help get your pages indexed.
It stands for Really Simple Syndication or Rich Site Summary.
To users, RSS feeds deliver a much easier way to consume a large amount of content in a shorter amount of time. Site owners get instant publication and distribution of new content, plus a way for new readers to “subscribe” to that content as it’s published.
Setting up your RSS feed with Feedburner (Google’s own RSS management tool) helps notify Google that you have a new site or blog that’s ready to be crawled and indexed.
RSS will also let Google know whenever you publish a new post or page which Google needs to index.
Step 11: Submit to Blog Directories
- Submitting your new URL to blog directories can help your site “get found” by new potential users.
- AND it can also help indexing take place more rapidly — if you go about it the right way.
Make sure you only submit to decently ranked and authoritative directories. Good examples to start with:
- Alltop subdomain for your niche or industry
Submitting to high quality sites with decent Domain Authority ratings can open your content up to a new audience, and provide incoming links that can nudge the search engines to crawl and index your site.
Why Google Isn’t Indexing a Site
2.6 Lecture Highlights and Notes from the Source: SearchEngineJournal.com
What to do if your site isn’t being indexed:
Google must index your site in order for your site to get any organic traffic from Google. The first step to fixing an indexing issue is diagnosing the indexing issue. Indexation is the keystone of good SEO. If your site or certain pages of your site aren’t indexing, you need to figure out why.
To diagnose your indexing issue, work through the following list from top to bottom, (from most common to least common) to find your cause and cure.
1. Your Site is Indexed Under a www- or Non-www Domain
Technically www is a subdomain. Thus, http://example.com is not the same as http://www.example.com. Make sure you add both sites to your Google Webmaster Tools (GWT) account to ensure they are both indexed. Be sure to set your preferred domain but verify ownership of both.
2. Google Hasn’t Found Your Site Yet
This is usually a problem with new sites. Give it a few days (at least), but if Google still hasn’t indexed your site, make sure your sitemap is uploaded and working properly. If you haven’t created or submitted a sitemap, this could be your problem. You should also request Google crawl and fetch your site. Here are Google’s instructions on how to do that:
- On the Google Search Console Home page, click the site you want.
- On the Dashboard, under Crawl, click Fetch as Google.
- In the text box, type the path to the page you want to check.
- In the drop-down list, select Desktop. (You can select another type of page, but currently we only accept submissions for our Web Search index.)
- Click Fetch. Google will fetch the URL you requested. It may take up to 10 or 15 minutes for Fetch status to be updated.
- Once you see a Fetch status of “Successful”, click Submit to Index, and then click one of the following:
- To submit the individual URL to Google’s index, select URL and click Submit. You can submit up to 500 URLs a week in this way.
- To submit the URL and all pages linked from it, click URL and all linked pages. You can submit up to 10 of these requests a month
3. The Site or Page(s) are Blocked With robots.txt
Another problem is your developer or editor has blocked the site using robots.txt. This is an easy fix. Just remove the entry from the robots.txt, and your site will reappear in the index. Read more about robots.txt here.
4. You Don’t Have a sitemap.xml
If you are experiencing indexation issues on any portion of your site, I recommend that you revise and resubmit your sitemap.xml just to make sure.
5. You Have Crawl Errors
In some cases, Google will not index some pages on your site because it can’t crawl them. Even though it can’t crawl them, it can still see them.
To identify these crawl errors, go to Google Webmaster Tools → Select your site, → Click on “Crawl” → Click on “Crawl Errors”. If you have any errors, i.e., unindexed pages, you will see them in the list of “Top 1,000 pages with errors.”
6. You Have Lots of Duplicate Content
Too much duplicate content on a site can confuse search engines and make them give up on indexing your site. If multiple URLs on your site are returning the exact same content, then you have a duplicate content issue on your site. To correct this problem, pick the page you want to keep and 301 (hide from crawling/indexing) the rest.
Look into the option to canonicalize pages, but be careful – confused canonicalization issue has prevented indexation.
7. You’ve Turned On Your Privacy Settings
If you have a WordPress site, you may have accidentally kept the privacy settings on. Go to Admin → Settings → Privacy to check.
8. The Site is Blocked by .htaccess
Your .htaccess file is part of your website’s existence on the server, which allows it to be available on the world-wide web. The .htaccess file is written in Apache. Although .htacess is handy and useful, it can be used to block crawlers and prevent indexation.
9. The Site Has NOINDEX in the Meta Tag
Another way of saying “no” to the robots, and thus not having any indexation, is to have noindex meta tags. Do you have a noindex tag on their home page causing the issue? Sometimes they are hard to spot due to redirects, so use a http header checker tool to verify before the redirects. It often looks like this:
<META NAME=”ROBOTS” CONTENT=”NOINDEX, NOFOLLOW”>
Remove this line of code, and you’ll be back in the index in no time.
11. Your Site Takes Forever to Load
Google doesn’t like it if your site takes an eternity to load. If the crawler encounters interminable load times, it will likely not index the site at all.
12. You Have Hosting Down Times
If the crawlers can’t access your site, they won’t index it.
Check your connectivity. If your host has frequent outage, it could be that the site isn’t getting crawled. Get a new host if this is the case.
13. You Got Deinedexed
This one is really bad… If you got hit with a manual penalty and removed from the index, you probably already know about it. If you have a site with a shady history (that you don’t know about) it could be that a lurking manual penalty is preventing indexation. If your site has dropped from the index, you’re going to have to work very hard to get it back in.
This article is not an attempt to discuss all the reasons for a manual penalty: Eric Siu’s post on the topic. Then, I advise you to do everything within your power to recover from the penalty. Finally, I recommend that you play a defensive game to prevent any further penalty.
Avoiding Mistakes with the Robots.txt file
Lecture 2.7 Highlights and Notes, Sourced from SearchEngineJournal
Best Practices for Setting Up Meta Robots Tags and Robots.txt
What Is Robots.txt?
Robots.txt is a text file that is used to instruct search engine bots (also known as crawlers, robots, or spiders) how to crawl and index website pages. Ideally, a robots.txt file is placed in the top-level directory of your website so that robots can access its instructions right away.
What to Hide With Robots.txt
Robots.txt files can be used to exclude certain directories, categories, and pages from search. For that end, use the “disallow” directive.
Here are some pages you should hide using a robots.txt file:
- Pages with duplicate content
- Pagination pages
- Dynamic product and service pages
- Account pages
- Admin pages
- Shopping cart
- Thank-you pages
How to Use Robots.txt
Robots.txt files are pretty flexible and can be used in many ways. Their main benefit, however, is that they enable SEO experts to “allow” or “disallow” multiple pages at once without having to access the code of every page, one by one.
Note: You can also add your robots.txt file manually to Google Search Console and, in case you target Bing, Bing Webmaster Tools. This is a much safer approach as by doing this, you protect your content from copying by webmasters of competitor sites.
Even though robots.txt structure and settings are pretty straightforward, a properly set up file can either make or break your SEO campaign. Be careful with settings: You can easily “disallow” your entire site by mistake and then wait for traffic and customers to no avail.
What Are Meta Robots Tags?
Meta robots tags (REP tags) are elements of an indexer directive that tell search engine spiders how to crawl and index specific pages on your website. They enable SEO professionals to target individual pages and instruct crawlers with what to follow and what not to follow.
How to Use Meta Robots Tags?
There are only four major tag parameters:
- index, follow: allow search bots to index a page and follow its links
- noindex, nofollow: prevent search bots from indexing a page and following its links
- index, nofollow: allow search engines to index a page but hide its links from search spiders
- noindex, follow: exclude a page from search but allow following its links (link juice helps increase SERPs)
In four simple steps, you can take your website indexation process up a level:
- Access the code of a page by pressing CTRL + U.
- Copy and paste the <head> part of a page’s code into a separate document.
- Provide step-by-step guidelines to developers using this document. Focus on how, where, and which meta robots tags to inject into the code.
- Check to make sure the developer has implemented the tags correctly. I recommend using The Screaming Frog SEO Spider to do so.
Meta robots tags are recognized by major search engines: Google, Bing, Yahoo, and Yandex. You don’t have to tweak the code for each individual search engine or browser (unless they honor specific tags).
If your site runs on an advanced CMS (OpenCart, PrestaShop) or uses specific plugins (like WP Yoast), you can also inject meta tags and their parameters straight into page templates.
Basic Rules for Setting Up Robots.txt and Meta Robots Tags
Knowing how to set up and use a robots.txt file and meta robots tags is extremely important. A single mistake can spell death for your entire campaign.
I personally know several digital marketers who have spent months doing SEO only to realize that their websites were closed from indexation in robots.txt. Others abused the “nofollow” tag so much that they lost backlinks in droves.
Dealing with robots.txt files and REP tags is pretty technical, which can potentially lead to many mistakes. Fortunately, there are several basic rules that will help you implement them successfully.
- Place your robots.txt file in the top-level directory of your website code to simplify crawling and indexing.
- Structure your robots.txt properly, like this: User-agent → Disallow → Allow → Host → Sitemap. This way, search engine spiders access categories and web pages in the appropriate order.
- Make sure that every URL you want to “Allow:” or “Disallow:” is placed on an individual line. If several URLs appear on one single line, crawlers will have a problem accessing them.
- Use lowercase to name your robots.txt. Having “robots.txt” is always better than “Robots.TXT.” Also, file names are case sensitive.
- Don’t separate query parameters with spacing. For instance, a line query like this “/cars/ /audi/” would cause mistakes in the robots.txt file.
- Don’t use any special characters except * and $. Other characters aren’t recognized.
- Create separate robots.txt files for different subdomains. For example, “hubspot.com” and “blog.hubspot.com” have individual files with directory- and page-specific directives.
- Use # to leave comments in your robots.txt file. Crawlers don’t honor lines with the # character.
- Don’t rely on robots.txt for security purposes. Use passwords and other security mechanisms to protect your site from hacking, scraping, and data fraud.
Meta Robots Tags
- Be case sensitive. Google and other search engines may recognize attributes, values, and parameters in both uppercase and lowercase, and you can switch between the two if you want. I strongly recommend that you stick to one option to improve code readability.
- Avoid multiple <meta> tags. By doing this, you’ll avoid conflicts in code. Use multiple values in your <meta> tag. Like this: <meta name=“robots” content=“noindex, nofollow”>.
- Don’t use conflicting meta tags to avoid indexing mistakes. For example, if you have several code lines with meta tags like this <meta name=“robots” content=“follow”> and this <meta name=“robots” content=“nofollow”>, only “nofollow” will be taken into account. This is because robots put restrictive values first.
Note: You can easily implement both robots.txt and meta robots tags on your site. However, be careful to avoid confusion between the two.
The basic rule here is, restrictive values take precedent. So, if you “allow” indexing of a specific page in a robots.txt file but accidentally “noindex” it in the <meta>, spiders won’t index the page.
Also, remember: If you want to give instructions specifically to Google, use the <meta> “googlebot” instead of “robots”. Like this: <meta name=“googlebot” content=“nofollow”>. It is similar to “robots” but avoids all the other search crawlers.
Search engine optimization is not only about keywords, links, and content.
The technical part of SEO can make the difference for your entire digital marketing campaign.
In Lectures 2.5, Lecture 2.6, and Lecture 2.7, there are many tips from experts about how to get Google to index a site. To gain a better understanding of this advice, complete the following instructions:
- Describe how the files “robots.txt” and “sitemap.xml” facilitate website indexing by search engines (a minimum of 200 words). Be sure to draw on the information presented in Lectures 2.5, 2.6, and 2. 7.
How robots.txt and sitemap.xml facilitate website indexing by search engines: