All About Analytics Data

About this guide

Read this guide if

  • you'd like to know how to aggregate, store and serve new analytics data with your plugin
  • you'd like to know what the Archiving Process is and how it is used to automatically aggregate and cache analytics data
  • you'd like to know how analytics data is stored and manipulated in PHP
  • you'd like to know what segments are and how you can define your own

Guide assumptions

This guide assumes that you:

  • can code in PHP,
  • and have a general understanding of extending Piwik (if not, read our Getting Started guide).

About Analytics

To analyze data is to search for patterns in a set of things. In Piwik those things are visits, web actions and goal conversions.

We search for patterns by reducing the set of things. Or in other words, we search for patterns by grouping individual things together to create subsets that are both recognizable and meaningful.

In Piwik the result of that grouping is the analytics data that it stores, displays and serves through an API. Read on to learn exactly what this data contains, how Piwik calculates and stores it, and how it is made available to Piwik users.

Analytics Reports & Metrics

Piwik aggregates and persists two types of analytics data: reports and metrics. The difference between the two is that a metric is a single numeric value whereas a report is a two-dimensional array of values. Reports will normally contain metric values, but they can contain other data (either additionally or in lieu of metric values).

Core metrics

All reports are defined by plugins. Metrics can also be defined by plugins, but there are several, called core metrics that are defined and calculated by Piwik Core.

New reports that analyze visits, action types or conversions should contain these metrics.

The following is a list of core metrics that relate to a set of visits:

  • Visits: Number of tracked visits (a visit is series of events each of which happened no more than 30 minutes apart). _Internally stored with the 'nb_visits' metric name.
  • Unique Visitors: The number of unique sources of visits (a visit source is an entity that causes a visit to be tracked). _Internally stored with the 'nb_uniq_visitors' metric name.
  • Actions: The number of tracked actions (an action is an event tracked by Piwik). _Internally stored with the 'nb_actions' metric name.
  • Max Actions: The maximum number of actions that occurred in one visit. _Internally stored with the 'max_actions' metric name.
  • Sum Visit Length: The sum of each visit's elapsed time. _Internally stored with the 'sum_visit_length' metric name.
  • Bounce Count: The number of visits that consisted of only one action. _Internally stored with the 'bounce_count' metric name.
  • Converted Visits: The number of visits that caused at least one conversion. Includes conversions for every goal of a site. _Internally stored with the 'nb_visits_converted' metric name.
  • Conversions: The number of conversions tracked for this set of visits. Includes conversions for every goal of a site. _Internally stored with the 'nb_conversions' metric name.
  • Revenue: The total revenue generated by these visits. Includes revenue for every goal of a site plus its ecommerce revenue. _Internally stored with the 'revenue' metric name.

The following is a list of core metrics that relate to a single action type:

  • Hits: The number times this action was ever done. Internally stored with the 'nb_hits' metric name.
  • Sum Time Spent: The total amount of time the user spent doing this action. Internally stored with the 'sum_time_spent' metric name.
  • Sum Page Generation Time: The total amount of time a server spent serving this action. Internally stored with the 'sum_time_generation' metric name.
  • Hits With Generation Time: The number of hits that included generation time information. Internally stored with the 'nb_hits_with_time_generation' metric name.
  • Min Page Generation Time: The minimum amount of time a server spent serving this action. Internally stored with the 'min_time_generation' metric name.
  • Max Page Generation Time: The maximum amount of time a server spent serving this action. Internally stored with the 'max_time_generation' metric name.
  • Unique Exit Visitors: The number of unique visitors that ever exited a site after this action. Internally stored with the 'exit_nb_uniq_visitors' metric name.
  • Exit Visits: The total number of visits that ended with this action. Internally stored with the 'exit_nb_visits' metric name.
  • Unique Entry Visitors: The total number of unique visitors that started a visit with this action. Internally stored with the 'entry_nb_uniq_visitors' metric name.
  • Entry Visits: The total number of visits that started with this action. Internally stored with the 'entry_nb_visits' metric name.
  • Entry Actions: Internally stored with the 'entry_nb_actions' metric name.
  • Entry Sum Visit Length: The sum of each entry visit's elapsed time. Internally stored with the 'entry_sum_visit_length' metric name.
  • Entry Bounce Count: The number of visits that consisted of this action and no other. Internally stored with the 'entry_bounce_count' metric name.
  • Hits From Search: The number of times this action was done after a site search. Internally stored with the 'nb_hits_following_search' metric name.

The following is a list of core metrics that relate to the set of ecommerce conversions (either all orders or all abandoned carts) recorded for a set of visits:

  • Revenue Subtotal: The total cost of every item that was a part of these orders or abandoned carts. Internally stored with the 'revenue_subtotal' metric name.
  • Revenue Tax: The total tax amount applied to these orders/abandoned carts. Internally stored with the 'revenue_tax' metric name.
  • Revenue Shipping: The total amount of shipping applied to these orders/abandoned carts. Internally stored with the 'revenue_shipping' metric name.
  • Revenue Discount: The total amount of discounts applied to these orders/abandoned carts. Internally stored with the 'revenue_discount' metric name.
  • Ecommerce Item Count: The total number of items in these orders/abandoned carts. Internally stored with the 'items' metric name.

Goal specific metrics

The following is a list of core metrics that relate to a set of visits and one goal of a site:

  • Goal Conversions: The conversions tracked for a specific goal and this set of visits. Stored in reports with a metric name of the format 'goal_%idGoal%_nb_conversions'.
  • Goal Revenue: The total revenue generated by the conversions for a specific goal. Stored in reports with a metric name of the format 'goal_%idGoal%_revenue'.

Note: In the metric names displayed above, '%idGoal%' should be replaced with the ID of a goal.

Goal specific metrics are stored in the database in the 'goals' column of serialized reports. The column contains a PHP array mapping goal IDs with arrays of goal specific metric values. These values are set as normal column values with the metric names described above by the AddColumnsProcessedMetricsGoal DataTable filter.

Processed metrics

In the interests of efficiency (in terms of both the speed of the Archiving Process and the size of the database), many metrics are not stored in the database. These metrics can be calculated using other metrics and so can be calculated right before reports are served. These metrics are collectively called processed metrics. Below is the list of processed metrics that are calculated using core metrics.

New reports that analyze visits, action types or conversions should be have these metrics added when possible.

Note: Some processed metrics will appear multiple times in the lists below. These metrics have different meanings based on the reports they are in.

The following is a list of processed metrics that relate to a set of visits:

  • Conversion Rate: The percent of visits that had at least one conversion. Stored in reports with the 'conversion_rate' metric name.
  • Actions Per Visit: The average number of actions for a single visit. Stored in reports with the 'nb_actions_per_visit' metric name.
  • Average Time On Site: The average number of time spent per visit in seconds. Stored in reports with the 'avg_time_on_site' metric name.
  • Bounce Rate: The percent of visits that resulted in a bounce. Stored in reports with the 'bounce_rate' metric name.

The following is a list of processed metrics that relate to a single action type:

  • Average Generation Time: The average amount of time it took for a server to serve this action. Stored in reports with the 'avg_time_generation' metric name.
  • Average Number of Search Result Pages Viewed: The average number of search result pages viewed after a site search. Only valid for site search keywords and site search categories. Stored in reports with the 'nb_pages_per_search' metric name.
  • Average Time On Page: The average amount of time users spent doing this action. Stored in reports with the 'avg_time_on_page' metric name.
  • Entry Bounce Rate: The percent of all visits that consisted of this action and no other. Stored in reports with the 'bounce_rate' metric name.
  • Exit Rate: The percent of all visits that ended with this action. Stored in reports with the 'exit_rate' metric name.

The following is a list of processed metrics that relate to the set of ecommerce orders recorded for a set of visits:

  • Average Order Revenue: The average revenue of each order. Stored in reports with the 'avg_order_revenue' metric name.

The following is a list of processed metrics that relate to the set of ecommerce items in a set of orders or abandoned carts:

  • Average Price: The average price of each item. Stored in reports with the 'avg_price' metric name.
  • Average Quantity: The average number of each item in an order/abandoned cart. Stored in reports with the 'avg_quantity' metric name.
  • Product Conversion Rate: The percent of orders/abandoned carts that include this item. Stored in reports with the 'conversion_rate' metric name.

Goal specific metrics

The following is a list of processed metrics that are also specific to one goal of one site:

  • Average Revenue per Visit: The average amount of revenue generated per visit for this goal. Stored in reports with the 'goal_%idGoal%_revenue_per_visit' metric name.

Note: In the metric names displayed above, '%idGoal%' should be replaced with the ID of the goal in question.

Naming metrics

Plugins that want to calculate and persist their own metrics must give them a name with the following format: "PluginName_metricName" where PluginName is the name of the plugin and metricName is the name of the metric. For example: "MyPlugin_myFancyMetric".

This naming convention is required in order to determine which plugins define which metrics. Not following this convention will result in errors during the Archiving Process.

Core metrics all have special names and do not follow this convention.

Reports and DataTables

Reports are stored in memory using the DataTable class. A DataTable is an array of rows where each row is an array of columns.

Each row contains metrics that relate to a set of visits, actions, conversions or some other entity. The set is defined and described by a special label column. How the column describes the set depends entirely upon the specific report. For example, in the report returned by the UserSettings.getBrowser report a row with the label Firefox would hold metrics for the set of all visits that used the Firefox browser.

Some reports, like VisitsSummary.get will not have a label column. These reports will have only one row that refers to the entire set of entities.

Row metadata

In addition to metrics, each row can also contain metadata. This metadata will usually assist the label column in describing the set of things the row represents.

Some metadata have special meanings. For example, metadata with the name 'logo' is treated as a path to an image that is used to describe the row. This image is displayed alongside rows when reports are displayed in the UI. The UserSettings.getBrowser and UserSettings.getOs reports use this metadata value to show an icon for each browser and OS.

Metadata with the name 'url' is treated as a URL that describes the row. The label of the row is linked to this URL when reports are displayed in the UI.

Subtables

Reports can be hierarchical. Each row in a report can be attached to another table of data. Any row in those tables can be attached to more tables, and so on ad infinitum. Tables that are attached to rows are called subtables.

Subtables provide further analytics for the set of visits that a row represents. For example, the Actions.getPageUrls report contains rows that describes a set of page view actions based on the first part of the page's URL. If this part is a directory and not a file, the row may have a subtable that describes that row's set of page view actions based on the second part of the page's URL.

Another example: the Referrers.getSearchEngines report contains a row for each search engine that was used in a visit. Each row will have a subtable that describes the keywords that were used with that search engine. The subtable rows will contain metric values for visits that used a specific keyword (determined by the subtable row) with a specific search engine (determined by the parent row).

Naming Reports

Reports should be named in the same way as non-core metrics. That is, they should have a name with the following format: "PluginName_reportName" where PluginName is the name of the plugin and reportName is the name of the report. For example: "MyPlugin_myFancyReport".

Plugins that do not follow this convention will cause errors during the Archiving Process.

Analytics Parameters

Reports and metrics provide analytics data about a set of things. Piwik determines what is in this set by using three constraints: a website ID, a period and a segment.

The website ID selects visits that were tracked for a specific website. This ID is specified in all HTTP requests by the idSite query parameter.

The period selects visits that were tracked within a specific date range. The period is specified in all HTTP requests by the date and period query parameters.

The segment selects visits based on a boolean expression that uses visit properties. It is specified in all HTTP requests by the segment query parameter and can be used to select almost any conceivable subset of visit.

Analytics parameters are normally stored in reports as report metadata (that is, they are stored as DataTable metadata).

Every report and metric describes a set of things determined by these three parameters: the website, period and segment.

Report & Metric Persistence (Archive Data)

When persisted, reports and metrics are collectively termed Archive Data, which simply means that the data has been cached and does not need to be re-calculated.

Persisted reports and metrics are indexed by the website ID, period and segment. The date and time that the data was calculated and cached is also attached to each report and metric. To learn the specifics of how this is done with MySQL see the Piwik database schema.

Metric persistence

Metrics are numeric values and so there is nothing special done when persisting them. The website ID, period, segment and datetime of caching are attached to the metric value, and all this information is saved.

Report persistence

Reports are complex data structures and so there is some extra processing required before they are persisted.

The report's list of rows (an array of DataTable\Row instances) is serialized using PHP's serialize function. The string result is then compressed using gzcompress.

Finally, the website ID, period, segment and datetime of caching are attached to the compressed data, and all of this information is then saved.

Records

When a report is archived, it is called a record not a report. We make a distinction because multiple reports can sometimes be generated from one record.

For example, the UserSettings plugin uses one record to hold visits by browser information. This record is used to generate both the UserSettings.getBrowserVersion report and the UserSettings.getBrowser report. The second report simply processes the first in a way to make a new report. The plugin could have archived both reports, but this would have been a massive waste of space, considering the new report would be cached for every website/period/segment combination.

Record storage guidelines

Care must be taken to store as little as possible when persisting records. Make sure to follow the guidelines below before inserting records as archive data:

  • Records should not be stored with string column names. Instead they should be replaced with integer column IDs (see Metrics for a list of existing ones).
  • Metadata that can be added using existing data should not be stored with reports. Instead they should be added in API methods when turning records into reports.

The Archiving Process

Analytics data is calculated and cached on-demand. When a report for a specific website, period and segment (if any) is requested, Piwik will check if the data has been cached, and if not Piwik will generate and cache it.

Archiving logic (the logic that calculates and caches analytics data) is defined by individual plugins. When archiving is initiated, every report defined by a plugin is archived together, rather than individually.

If no segment is supplied in the data query and data cannot be found, every report of every plugin will be generated and cached all at once. If a segment is supplied, then the reports that belong to the same plugins as the requested data will be generated and cached.

Plugin Archivers

Plugins that want to archive reports and metrics define a class called Archiver that extends from Piwik\Plugin\Archiver. This class will be automatically detected and instantiated by Piwik during the archiving process.

Report & Metric Aggregation

Reports and metrics are calculated differently based on the period type.

For day periods, the visits/actions/conversions/etc. (called log data) are themselves aggregated.

For other periods, the reports & metrics for the days within the periods are aggregated together. For example, when generating a report for a week period, the report for each day within the week (ie, Monday, Tuesday, Wednesday, etc.) will be queried and then aggregated together. This is far faster than aggregating each individual visit/action/etc. that was tracked during the entire week, but creates the same result. 1

Log data aggregation is handled by the LogAggregator class. Archive data aggregation is handled by the ArchiveProcessor::aggregateDataTableRecords and ArchiveProcessor::aggregateNumericMetrics methods. Plugins can access a LogAggregator instance and a ArchiveProcessor instance through the Piwik\Plugin\Archiver class.

To learn more about how aggregation is accomplished with Piwik's MySQL backend, read the Piwik database schema guide.

[1] Because of this technique, we cannot calculate unique visitors for non-day periods without aggregating over all visits within the period.

Report & Metric Caching

Reports and metrics are persisted using the ArchiveProcessor class. Metrics are inserted using the ArchiveProcessor::insertNumericRecord method. Reports are first serialized using the DataTable::getSerialized method and then inserted using the ArchiveProcessor::insertBlobRecord method:

$archiveProcessor = // ...

// insert a numeric value
$myFancyMetric = // ... calculate the metric value ...
$archiveProcessor->insertNumericRecord('MyPlugin_myFancyMetric', $myFancyMetric);

// insert a record (with all of its subtables)
$maxRowsInTable = Config::getInstance()->General['datatable_archiving_maximum_rows_standard'];j

$dataTable = // ... build by aggregating visits ...
$serializedData = $dataTable->getSerialized($maxRowsInTable, $maxRowsInSubtable = $maxRowsInTable,
                                            $columnToSortBy = Metrics::INDEX_NB_VISITS);

$archiveProcessor->insertBlobRecords('MyPlugin_myFancyReport', $serializedData);

Pre-archiving with cron

Though data is generated on demand, it would be highly inefficient and create a poor user experience if we relied on it for all users. Any user that receives a significant amount of visits would experience a large delay before being able to view their reports. Piwik solves this problem with a console command that launches the archiving process. The should be executed by cron.

The console command can archive data for every website and for every period except range periods. Reports & metrics for stored segments will also be archived.

The console command will remember when it was last executed and will only initiate the archiving process for a website if there have been visits since that time.

Disabling browser initiated archiving

For users that have websites that receive a lot of visits, simply allowing on-demand archiving through the browser will cause undesirable delays. These users can disable browser initiated archiving. Read the user docs for more info.

Serving Reports

Reports are served through the API classes defined by individual plugins. API methods access persisted records transform them into presentable reports and serve them through Piwik's Reporting API either in HTTP responses or to PHP code (such as Controller methods).

Transforming Records into Reports

As stated above, records are not the same as reports. Records are structured primarily to be stored not be read by either humans or other software. Thus API methods cannot simply access persisted data and return it. They must manipulated and made presentable.

DataTable Filters

DataTable instances, which are used to hold reports, are manipulated by either iterating through rows and manually making changes or through the use of DataTable filters. DataTable filters manipulate DataTable instances in some way. There are several predefined ones that allow you to do common things without having to write a lot of code.

Making a report presentable involves undo-ing the changes that made it more efficient to store. Column names can be changed from integer IDs to string metric names via the ReplaceColumnNames DataTable filter:

$dataTable->filter('ReplaceColumnNames');

Metadata and processed metrics should also be added within API methods. Existing filters (everything in the core/DataTable/Filter directory) can be used to perform most of these tasks.

API processing of reports

When a report is returned from an API method it goes through some extra processing based on what query parameters are set for the request. To see exactly what happens to a report, read the relevant section in our Piwik's Reporting API guide.

Learn more