AngularJS SEO – Single Page App Crawling and Indexing

This guide was last updated on 25th of September, 2017.

The below test applies to AngularJS v1.x. If you’re using v2+, this does not apply to you (assuming you have set up server side rendering).

Doing research, we found that the overwhelming opinion on the internet is that if you have an AngularJS app and you want it to be crawled and indexed by Google, you need to pre-render the app, generate HTML snapshots, and other things which take time to develop and maintain.

tl;dr: Our research, empirical testing, and communication with Google representatives shows that this advice is partially wrong – you do not need to serve different or pre-rendered content to Google. In fact, as long as you’re following regular SEO conventions within your app (i.e. using a link element for a link instead of a weird click hook), Google can crawl, render, and index your AngularJS app just fine, but it is much riskier and takes several times longer than plain html.

Disclaimer: This is a theoretical concept. Our tests show that Google can crawl, render, and index a pure-AngularJS site just fine when it’s set up properly. However, correct crawling and indexing will take much much longer than conventional pages, and other search engines and crawlers are unlikely to be able to correctly crawl your content. In a pure-JS environment, you won’t get indexed by Bing, Yahoo, or Yandex; and social networks such Facebook, Twitter, Pinterest, and many others won’t have any clue what your pages are about. In 2017, you shouldn’t implement a pure JS architecture on a production site if you care about traffic from external sources. We highly recommend that you prerender your AngularJS website with an external service such as url.Render().

So how do we build an SEO friendly AngularJS single page app without relying on pre-rendered HTML?

Avoid things that Google (probably) can’t handle

The things that Google can’t handle within AngularJS single page apps are the same things that Google can’t handle on regular flat html pages. Follow some common sense guidelines:

Don’t use hashtags to define the URL

Out of the box, AngularJS makes all pages a hashtag on the app url, i.e.

  • http://www.domain.com/app
  • http://www.domain.com/app#/page-1
  • http://www.domain.com/app#/page-2

This is the same whether we’re talking about AngularJS or a Wikipedia page – Google might use hashes (URL fragments) to allow users to jump to the right content, but they normally won’t index each URL fragment as a discrete page.

Instead you need to use proper, hard URLs. Things like:

  • http://www.domain.com/app/
  • http://www.domain.com/app/page-1
  • http://www.domain.com/app/page-2

To achieve this, you need to enable “HTML5” mode in your angular app, and tell your webserver to forward requests for non-existent URLs to your app.

Set HTML5 mode in AngularJS – add the following to your app configuration script (we learned this at scotch.io):

$locationProvider.html5Mode(true);

Tell your webserver to forward all requests to the AngularJS app – this is the same method you’ll use to make any CMS work. In Apache, we do something like:

RewriteEngine On
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} -f [OR]
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} -d
RewriteRule ^ - [L]
RewriteRule ^ /APPURL.html

This should make all requests for “hard” URLs forward to the Angular app, which now (because of HTML5 mode) accepts URLs in this format

Don’t use weird linking elements

When crawling plain HTML pages, Google normally doesn’t follow links that use a form element or a custom Javascript hook. The same is true for AngularJS.

Our tests show that Google can discover and crawl links which are plain link elements (<a href=””>). This is the case for both links which are coded into the app (i.e. navigation links in the main template), and links which have to be rendered by the client (i.e. by retrieving plain HTML from an external file, or building the link with a series of Javascript definitions).

The danger when building AngularJS apps is to get fancy and use weird linking techniques – maybe you render a <span> element, then attach a click hook which fires a pushState(). Google (probably) can’t follow these links (and AngularJS does a good job of calling the content anyway, so I’m not sure why you would want to). Don’t do this.

Don’t block your resources

This should be obvious – if you’ve blocked some essential script or data resource from being crawled by robots, then robots won’t be able to render your page properly.

Our crawlable AngularJS single page app

To test whether Googlebot can actually crawl, render, and index AngularJS pages without any help, we set up a single page app with the following rules:

  1. URLs are flat, clean URLs – no parameters or fragments (hashtags) are used for navigation
  2. Some links are within the main Angular HTML (really, Google should crawl these regardless of whether it renders anything), and some are only accessible after rendering a page
  3. All pages have unique metadata – page title, meta description, canonical URL, and robots directives
  4. All pages have a few hundred words of fresh content on a unique (albeit nonsensical) topic to encourage Google to add the URLs to its index (just in case one day, some one decides to buy a “laptop coat rack barge”)
  5. Some pages have the noindex directive – we want Google to crawl these, but not add them to the index
  6. No sitemap of URLs is provided to Google – we only notified them of the homepage of our app via Fetch and Render
  7. We log all HTTP requests for both the main app and the content pages so that we can tell when Google has crawled each of the URLs.

We put the app online at http://angular.abertram.com/

2016-01-25_1220

This site contains five pages:

  • /
    • links to insult, applause, and laptop
  • /insult-hosepipe-salmon
    • contains a “noindex” robots directive
  • /applause-rudder-teapot
    • links to /orphan-purple-roadworks
  • /laptop-coatrack-barge
  • /orphan-purple-roadworks

You can check the titles, descriptions, canonicals, and robots directives by inspecting the DOM of each page (‘view source’ isn’t helpful here – find out why).

angular-meta

So, all going according to plan, Google should crawl this app and index the following pages:

  • /
  • /applause-rudder-teapot
  • /laptop-coatrack-barge
  • /orphan-purple-roadworks

I’m not going to go through the contents of the code, if you want to see it you can get the source code at the end of this post.

We put the site live and submitted to to Google by verifying Search Console and using the Fetch tool. The early results were very positive – the homepage got indexed within minutes and all of the (dynamically set) page metadata was being pulled into the search result page. Google was even recognising that the “brand” of this subdomain is “AngularJS Test” and moving that to the front of the string like they do for ‘real’ homepages.

2016-01-20_1553

2016-01-20_1549However, after hours of waiting, our logs showed that no other pages or content had even been crawled.

I spoke to Google about this

I reached out to John Mueller at Google – was my hypothesis was even correct? is there some kind of special JS crawling bot (like we saw with mobile crawling) that hasn’t had a chance to get to this site yet? was googlebot ignoring every other page because this is a brand new sub domain with no link equity? His response was basically: Yes, Google should be able to crawl and index this site. Maybe try linking to the app to encourage crawling.

I’m pretty confident that’ll work — but like you mentioned, this is a very artificial setup so things might not be as “normal” in processing as otherwise.
We can crawl, render & index that page, we see the links, so theoretically we can get the rest too.
You might see some speed-up with Fetch as Google + submit page & linked pages to index, if you want to give that a shot, but if you’re patient I’m sure it’ll catch up too. Another thing that might help is to have a link to the homepage from somewhere, so that we’re more encouraged to look at the site.
I’m curious to see how it works out, in any case, so I’ll avoid doing anything from my side.

With this news I doubled down and added a couple of links to this app from other sites which I own. Then we waited. Two weeks later we had a result.

What happened?

As expected, Google crawled and indexed the app.

Our crawl log showed that Google had been requesting the pages:

2016-02-05_1128Google had started to index the pure JS pages:

2016-02-05_1129and these pages were ranking for their generic terms:

2016-02-05_1129_001

2016-02-05_1157

We monitored these pages for the next six months – the pages constantly switched between being correctly indexed, and being indexed with only the raw (unrendered) content; pages occasionally dropped out of the index, then came back a few weeks later.

Conclusions

It’s clear that Google can crawl, render, and index AngularJS apps which have been built to be crawled. One could argue that there’s no need to undertake expensive and time consuming pre-rendering, or do weird search engine cloaking for Google, but you probably shouldn’t: the other big search engines will not respond as well; neither will social network crawlers;nor the adwords crawler that extracts info to build your quality score. Build for your core markets.

If you rely on search engines for traffic, you should be prerendering your website.

In this test, I omitted creating an XML sitemap as it would have only made the results less clear. In a production environment, you should absolutely be generating a sitemap containing every canonical URL to submit that to WMT and reference it in your /robots.txt file.

Get our source code

If you’re reading this page, chances are you’re developing an Angular app and would probably find the source code of our test useful. You can get the source code of this app.