RankBrain - What it is and why you shouldn't care about it

Discussion in 'SEO and Marketing' started by cardine, Aug 10, 2016.

Share This Page

  1. cardine

    cardine Administrator Staff Member

    Joined:
    Dec 9, 2015
    Messages:
    1,059
    Likes Received:
    1,022
    I've heard a lot of people ask about RankBrain and I've also seen a lot of people spew absolute BS about RankBrain. Some say it penalizes links, some say it analyzes content quality, and pretty much anyone who has ever said anything about RankBrain and what you need to do about it is completely wrong. So I wanted to make this post to clear up all of this misinformation and explain what RankBrain is, what it does, and what it means to SEOers.

    I think I am especially qualified to do this because we have been doing a lot of research into the same areas of AI that are being used for RankBrain, so I can talk both from what I know about RankBrain, but also what I know about this specific AI field in general.

    In short what has changed with RankBrain is that it takes advantage of a new-ish (very neural network centric) branch of technology called deep learning. One of the core concepts with deep learning and how it relates to natural language processing is that words are represented by a sequence of numbers, and through some cool technologies and algorithms you are able to make it so that similar words have a similar pattern in numbers. For instance the pattern of numbers for "car" and "bike" will be much closer than the pattern of numbers for "car" and "computer".

    RankBrain takes advantage of this by creating algorithms where instead of having a database of words (called one hot vectors) there is instead a database of patterns that represent each word. So now when you search "auto insurance" it is very close to searching "car insurance" and similarly when it sees a website on the internet that talks about "auto insurance" it knows it is basically synonymous with "car insurance".

    Similarly RankBrain (if their algorithms look anything like what they should look like based on research in this field) should be able to take an entire website and summarize that into a pattern of numbers. So if there is an article about great car insurance deals it should produce a pattern of numbers that is part way between the pattern of numbers for "car", part way between the pattern of numbers for "insurance", and partway between the pattern of number for "deals". Or in other words the article will be represented as an average of the major keywords it is referring to. This also means that this pattern (called a vector) is very similar to near matches like "auto", "bargains" and just about anything else. This technology is very similar to what we currently employ right now when we scan through the internet trying to ascertain the meaning of different articles.

    So... enough backstory. Here's what RankBrain does in a single sentence:
    In a single sentence, RankBrain takes a phrase and turn it into a pattern of numbers, takes a paragraph (or an entire webpage) and turns that into a pattern of numbers and compares how similar (cosine similarity) those two patterns of numbers are, which would give Google a very strong indication as to how related a webpage is to a query. So that could mean a site talks a lot about great car insurance deals could still score as being very relevant to the keyword "auto insurance bargains" despite never having the word "auto" or "bargain" on the webpage at all.

    What RankBrain DOESN'T Do
    RankBrain in no way gauges the quality of an article or whether it is spammy or anything else along those lines. This update is exclusively improving Google's ability to figure out what things are "relevant" as opposed to what things are intrinsically good, bad, trustworthy, or untrustworthy. This is a very core Google algorithm change that is not targeted towards anyone besides trying to improve Google's ability to properly accomplish information retrieval at a very large scale. As a result anyone who starts talking about spam or content quality or anything else like that simply doesn't know what they are talking about. There are many Google updates that target those things, but RankBrain is not one of them.

    Additionally, RankBrain has absolutely nothing to do with Google's Knowlege Graph or Hummingbird or anything like that. As someone who has had no issues with changing Google's Knowledge Graph to my benefit I can say that they are polar opposite approaches towards answering queries. Hummingbird and other updates like that are based on databases which hold very rigid rules between different words. For instance the phrase/entity "Larry Page" has a very specific "founder" relationship on the phrase/entity "Google". These are rigid rules that you would expect to see in a database somewhere. RankBrain is the opposite of that - words are represented as patterns, so there are no explicit rules between words. In other words Hummingbird has explicit, black and white rules, and RankBrain is much more fluid.

    So in short, RankBrain has no impact on the Knowledge Graph, as the Knowledge Graph has very rigid relationships between entities, and RankBrain's architecture is very incompatible with that.

    What should I do about RankBrain?

    In my opinion RankBrain supports the trend of Google being less reliant on exact match anchor text and focuses more on the content of linking pages. If Google is able to do a better job representing an article in the same vector space as a keyword (basically finding a way to compare the relevancy of an entire article to a keyword in an apples to apples comparison) then Google is less reliant on more superficial or "hacky" ways of figuring out what is relevant to what.

    So in my opinion anchor text is going to continue to matter way way less compared to the content on the linking page itself. If Google thinks the linking page is about "auto insurance quotes" that would be more valuable than a link with the anchor text "auto insurance quotes". So the biggest thing I would do is make sure that your content on your linking pages is relevant to your keyword, because that is what Google is looking at and what RankBrain will allow it to look at.

    What shouldn't I do about RankBrain

    What I wouldn't do is go crazy or change anything major. It is true that an article about "auto insurance bargains" could not have the word "auto" or "bargain" and still rank well if Google thinks the article is very close to representing "auto insurance bargain". But you know what Google thinks will likely be even more close to representing an article about "auto insurance bargain"? An article that has that phrase in it! The ways these algorithms work it looks at the difference between words, so as mentioned "car" and "auto" will likely be represented as being very similar... but obviously "car" is more similar to "car" even if "auto" is pretty close.

    So a lot of people use this opportunity to go on about LSI keywords and things like that, and although Google might be considering that at some point, there is no evidence that RankBrain deals with that at all. Based on the field of AI that RankBrain is associated with the only thing you could reasonably conclude is that RankBrain is designed to try to do a better job representing what keywords an article is "about" and nothing more. Google got to the point where it was able to mostly do away with Meta Keywords with its increased ability to approximate topicality with more basic algorithms (like maximizing log likelihood ratio) and now as it does a better job determining the topicality of articles, other "keyword" based ways of signaling topicality (such as anchor text) are going to get less and less important.

    So what's the tl;dr?
    RankBrain only attempts to do a better job figuring out what an article is about. It doesn't target blackhat sites, it doesn't gauge article quality, and it doesn't look at links. The only real world changes you should make are making sure your articles (on both your ranking page and your linking page) are thematically about the keyword you are trying to rank for, as Google is making that into its biggest way of determining keyword relevancy.
     
    Mark23, xepa, ... and 1 other person like this.
  2. cherub

    cherub Active Member

    Joined:
    Feb 24, 2016
    Messages:
    33
    Likes Received:
    28
    Great writeup @cardine! The first thing that came to mind from your description of RankBrain was the ancient Soundex algorithm
     
    cardine likes this.
  3. xepa

    xepa Member

    Joined:
    Feb 9, 2016
    Messages:
    83
    Likes Received:
    38
    Great post! I do have one question for you:

    How do you know this is true? If they know what words belong together could they punish a site that has words that don't belong together?
     
  4. cardine

    cardine Administrator Staff Member

    Joined:
    Dec 9, 2015
    Messages:
    1,059
    Likes Received:
    1,022
    A lot of Rank Brain (or any other similar algorithm) is trained off of co-occurrence. Basically you learn things about words based on what words they appear next to. So for instance if you see the following sentences:
    You could deduce that car and desk share some properties because they both appear next to "bought", and that car and truck share some properties because they both appear next to "drove" and lastly car, desk, and truck all share some properties because they all appear next to "a". So that can help an algorithm like this learn what words are related to other words by looking at what words tend to show up nearby other words.

    So if we gave Rank Brain the following two sentences:
    It would likely think the second sentence is better than the first sentence. It knows very little about Obama, and considers it a much more rare word. However it has seen all of the words in the second sentence, knows they are all more common, and knows at the very least "I drove a" can exist together (since it has already existed before) and that "car" is at least vaguely similar to "keyboard". And Lincoln is about as rare as Obama and they both appear in many similar articles, making it extremely hard to know that one of these is impossible while the other one isn't.

    In short the problem is that just because a bunch of words are common (and are even topically similar) doesn't mean they all belong together. Since Rank Brain can only look at what words tend to be together it will be very biased towards obviously bad sentences like "I drove a keyboard".

    The other big issue with this is that some of the best content out there tends to put words together that most other articles wouldn't. A New York Times article will tend to use more sophisticated vocabulary which by definition makes it more uncommon. A sentence like "I moved my car" will be considered "better" by an algorithm like Rank Brain than a sentence like "With expert precision I lifted the car tires from the ground."

    So very simplistically, co-occurrence is used to power Rank Brain and co-occurrence is a poor way to judge the actual quality of an article for the reasons stated above, most specifically it punishing vocab sophistication (which occurs more in high quality articles). So as a result Rank Brain is extremely ill fitted to solve the problem of how high quality an article is.