Why is cleaning up duplicated URLs in Google Analytics reports so important?

Reading Time: 5 minutes

At the point I was writing the article about using the Filters to get the correct Shopify numbers in Google Analytics, the idea was to help out the numerous Shopify store owners and cover a fairly niched topic as there was no material on that on the Web. But, after I got a feedback from some colleagues that I should have started with covering the broader topic first – duplicated URLs in Google Analytics Pages report – and why removing them is such an important thing, I figured a new article is due.

Reading out about Regex and Google Analytics Filters that use it would probably be a smart idea, in case you feel you are not on solid ground with Regular Expressions. Another useful article on cleaning up GA URLs, although written eight years ago, is still a helpful innuendo into the topic we are covering today, and you should definitely read it before you delve into my post.

Google Analytics Pages report

Let’s begin by brushing up our knowledge about the Pages report and the metrics it contains. You can safely skip to the next section if you feel you are already comfortable and familiar with this Report.

Pageview

A number that shows how many times one particular page was loaded into the visitor’s browser. Reloading the page thus triggers another pageview.

Notice that loading a page in the browser doesn’t necessarily mean the content of the page was consumed/read, yet that can still be tracked with a scroll depth script for GA.

Unique Pageview

This metric is similar to the Pageview, with the exception that it happens only once during a session/visit to the website. You can have multiple pageviews of one page, but only one unique pageview of that page, while your visit lasts.

Average Time on Page

As the word says, this metric should describe how long visitors have been on a page, on average.

Unfortunately, that is not what it indeed describes because GA tracks the time on the page only as a difference between two timestamps of the different page loads (or any other tracking hits sent to GA, including events). What it effectively means is that GA doesn’t know how much time you spent on a page if you just loaded it, reviewed the content on it and then left without going to another page of the same site.

This default behavior or GA can also be modified and tuned, so that time on page is pretty accurately tracked, but as a consequence, that modification heavily skews and reduces Bounce Rate, which is not what you usually would want to do. You can read more about that here.

Entrances

A number that shows in how many cases this page was a landing page — one that visitors began their visit through.

Notice the difference here from the Unique Pageviews: you can start your visit to the website on one particular page, and that will yield one Entrance and one Unique Pageview. But, you can visit this page after some other page was a landing one, implying that the number of Uniques will be equal to or higher than the number of Entrances for that page.

Bounce Rate

Percentage of visits (sessions) that end with only one page viewed.

As previously mentioned when Average Time on Page was discussed, introducing additional events to the tracking on one page could result in reduced Bounce rate as well as increased Time tracked on that page.

% Exit

This number shows the percentage of visits that end on this particular page.

Since one visit might or might not begin on this page, Exit percentage will be equal to or higher than the number of Bounces out of that page. Said differently, if visitors start their visit on this page and then exit without visiting any other page, Exit = Bounce, but if they start on some other page and exit on the other one, then it is one Exit and zero Bounces.

Page Value

Often a neglected metric, but quite an important one for a website which has a monetary value attached to GA goals or that counts transactions. What I mean by this is that Page Value reflects total monetary value on the site divided by the number of unique pageviews of a page. As a result, this metric shows how valuable one page is to the overall revenue or website monetary gain.

Duplicated URLs and the impact on the Google Analytics Page metrics

For the purpose of this article, let me divide the Page URLs reported in GA into two groups: those which are “clean” and those with some additional “parameters”.

The “clean” URLs contain only the path to the page on the website, and nothing else. An example of this case is the path to this article: /articles/google-analytics/cleaning-up-duplicated-google-analytics-reports-important/

On the other side, those URLs with the parameters most probably contain a “question” mark (?) and then one or more values, separated with an “and” mark (&). An example would be:/articles/?page=2&trkid=1lkjdsf. Notice two different parameters — “page” and “trkid”, with their values.

These page might have the parameters for various reasons: they have been attached by some third party service or tracking script that you are using; or they are attached by your website (to show pagination, or elements order on some product category pages, session ID, etc.); or they might as well show sessions generated by your developer while he was making changes and testing your site (like “preview_id” for example).

So, imagine you are analyzing the performance of one page trying to determine if it requires optimization of the content (because of a high bounce rate since it is a landing page that should dispatch visitors, like a homepage), or how valuable it is in achieving your financial goal for the website. Now, instead of seeing only one /articles/ page in the list on the Pages report, you see /articles/?page=2, then /articles/?page=3&ssid=123543 or any other variation of these cases.

Naturally, this creates segmentation and gives you inaccurate metrics. Instead of having only one page, you have multiple variations of it, even though it might be the same page. With that said, Pageviews are obviously underreported, Average Time on Page and Bounce Rates are incorrect, Page Value is lower than it should be. All of these can lead you to wrong conclusions about the performance of that Page.

Cleaning it all up

I have covered the cleanup pretty thoroughly in the article about Shopify URLs so I won’t repeat the whole procedure here, with all the screenshots.

Briefly said, you should mitigate this issue by creating Google Analytics Filters. If you need, and as the other article says, you can choose to store the values of those parameters. If you don’t, you can skip the part with creating Custom Dimensions and creating Filters to output to those Custom Dimensions, and proceed directly to the Filter for removal of these additional parameters.

There can be a different approach to this matter — cleaning those parameters through Google Tag Manager, before they reach Analytics — but that solution creates an additional layer of complexity (because you either need to create GTM anew or have one already in which you would need to add custom JavaScript), thus I won’t deal with it here.

Posted on: August 4, 2018 by igorkolosov

Leave a Reply

Your email address will not be published. Required fields are marked *