Read this guide if
This guide assumes that you:
To analyze data is to search for patterns in a set of things. In Piwik those things are visits, web actions and goal conversions.
We search for patterns by reducing the set of things. Or in other words, we search for patterns by grouping individual things together to create subsets that are both recognizable and meaningful.
In Piwik the result of that grouping is the analytics data that it stores, displays and serves through an API. Read on to learn exactly what this data contains, how Piwik calculates and stores it, and how it is made available to Piwik users.
Piwik aggregates and persists two types of analytics data: reports and metrics. The difference between the two is that a metric is a single numeric value whereas a report is a two-dimensional array of values. Reports will normally contain metric values, but they can contain other data (either additionally or in lieu of metric values).
All reports are defined by plugins. Metrics can also be defined by plugins, but there are several, called core metrics that are defined and calculated by Piwik Core.
New reports that analyze visits, action types or conversions should contain these metrics.
The following is a list of core metrics that relate to a set of visits:
The following is a list of core metrics that relate to a single action type:
The following is a list of core metrics that relate to the set of ecommerce conversions (either all orders or all abandoned carts) recorded for a set of visits:
Goal specific metrics
The following is a list of core metrics that relate to a set of visits and one goal of a site:
Note: In the metric names displayed above,
'%idGoal%' should be replaced with the ID of a goal.
Goal specific metrics are stored in the database in the
'goals' column of serialized reports. The column contains a PHP array mapping goal IDs with arrays of goal specific metric values. These values are set as normal column values with the metric names described above by the AddColumnsProcessedMetricsGoal DataTable filter.
In the interests of efficiency (in terms of both the speed of the Archiving Process and the size of the database), many metrics are not stored in the database. These metrics can be calculated using other metrics and so can be calculated right before reports are served. These metrics are collectively called processed metrics. Below is the list of processed metrics that are calculated using core metrics.
New reports that analyze visits, action types or conversions should be have these metrics added when possible.
Note: Some processed metrics will appear multiple times in the lists below. These metrics have different meanings based on the reports they are in.
The following is a list of processed metrics that relate to a set of visits:
The following is a list of processed metrics that relate to a single action type:
The following is a list of processed metrics that relate to the set of ecommerce orders recorded for a set of visits:
The following is a list of processed metrics that relate to the set of ecommerce items in a set of orders or abandoned carts:
Goal specific metrics
The following is a list of processed metrics that are also specific to one goal of one site:
Note: In the metric names displayed above,
'%idGoal%' should be replaced with the ID of the goal in question.
Plugins that want to calculate and persist their own metrics must give them a name with the following format:
"PluginName_metricName" where PluginName is the name of the plugin and metricName is the name of the metric. For example:
This naming convention is required in order to determine which plugins define which metrics. Not following this convention will result in errors during the Archiving Process.
Core metrics all have special names and do not follow this convention.
Each row contains metrics that relate to a set of visits, actions, conversions or some other entity. The set is defined and described by a special label column. How the column describes the set depends entirely upon the specific report. For example, in the report returned by the UserSettings.getBrowser report a row with the label Firefox would hold metrics for the set of all visits that used the Firefox browser.
Some reports, like VisitsSummary.get will not have a label column. These reports will have only one row that refers to the entire set of entities.
In addition to metrics, each row can also contain metadata. This metadata will usually assist the label column in describing the set of things the row represents.
Some metadata have special meanings. For example, metadata with the name
'logo' is treated as a path to an image that is used to describe the row. This image is displayed alongside rows when reports are displayed in the UI. The UserSettings.getBrowser and UserSettings.getOs reports use this metadata value to show an icon for each browser and OS.
Metadata with the name
'url' is treated as a URL that describes the row. The label of the row is linked to this URL when reports are displayed in the UI.
Reports can be hierarchical. Each row in a report can be attached to another table of data. Any row in those tables can be attached to more tables, and so on ad infinitum. Tables that are attached to rows are called subtables.
Subtables provide further analytics for the set of visits that a row represents. For example, the Actions.getPageUrls report contains rows that describes a set of page view actions based on the first part of the page's URL. If this part is a directory and not a file, the row may have a subtable that describes that row's set of page view actions based on the second part of the page's URL.
Another example: the Referrers.getSearchEngines report contains a row for each search engine that was used in a visit. Each row will have a subtable that describes the keywords that were used with that search engine. The subtable rows will contain metric values for visits that used a specific keyword (determined by the subtable row) with a specific search engine (determined by the parent row).
Reports should be named in the same way as non-core metrics. That is, they should have a name with the following format:
"PluginName_reportName" where PluginName is the name of the plugin and reportName is the name of the report. For example:
Plugins that do not follow this convention will cause errors during the Archiving Process.
Reports and metrics provide analytics data about a set of things. Piwik determines what is in this set by using three constraints: a website ID, a period and a segment.
The website ID selects visits that were tracked for a specific website. This ID is specified in all HTTP requests by the idSite query parameter.
The period selects visits that were tracked within a specific date range. The period is specified in all HTTP requests by the date and period query parameters.
The segment selects visits based on a boolean expression that uses visit properties. It is specified in all HTTP requests by the segment query parameter and can be used to select almost any conceivable subset of visit.
Analytics parameters are normally stored in reports as report metadata (that is, they are stored as DataTable metadata).
Every report and metric describes a set of things determined by these three parameters: the website, period and segment.
When persisted, reports and metrics are collectively termed Archive Data, which simply means that the data has been cached and does not need to be re-calculated.
Persisted reports and metrics are indexed by the website ID, period and segment. The date and time that the data was calculated and cached is also attached to each report and metric. To learn the specifics of how this is done with MySQL see the Piwik database schema.
Metrics are numeric values and so there is nothing special done when persisting them. The website ID, period, segment and datetime of caching are attached to the metric value, and all this information is saved.
Reports are complex data structures and so there is some extra processing required before they are persisted.
Finally, the website ID, period, segment and datetime of caching are attached to the compressed data, and all of this information is then saved.
When a report is archived, it is called a record not a report. We make a distinction because multiple reports can sometimes be generated from one record.
For example, the UserSettings plugin uses one record to hold visits by browser information. This record is used to generate both the UserSettings.getBrowserVersion report and the UserSettings.getBrowser report. The second report simply processes the first in a way to make a new report. The plugin could have archived both reports, but this would have been a massive waste of space, considering the new report would be cached for every website/period/segment combination.
Record storage guidelines
Care must be taken to store as little as possible when persisting records. Make sure to follow the guidelines below before inserting records as archive data:
Analytics data is calculated and cached on-demand. When a report for a specific website, period and segment (if any) is requested, Piwik will check if the data has been cached, and if not Piwik will generate and cache it.
Archiving logic (the logic that calculates and caches analytics data) is defined by individual plugins. When archiving is initiated, every report defined by a plugin is archived together, rather than individually.
If no segment is supplied in the data query and data cannot be found, every report of every plugin will be generated and cached all at once. If a segment is supplied, then the reports that belong to the same plugins as the requested data will be generated and cached.
Plugins that want to archive reports and metrics define a class called Archiver that extends from Piwik\Plugin\Archiver. This class will be automatically detected and instantiated by Piwik during the archiving process.
Reports and metrics are calculated differently based on the period type.
For day periods, the visits/actions/conversions/etc. (called log data) are themselves aggregated.
For other periods, the reports & metrics for the days within the periods are aggregated together. For example, when generating a report for a week period, the report for each day within the week (ie, Monday, Tuesday, Wednesday, etc.) will be queried and then aggregated together. This is far faster than aggregating each individual visit/action/etc. that was tracked during the entire week, but creates the same result. 1
Log data aggregation is handled by the LogAggregator class. Archive data aggregation is handled by the ArchiveProcessor::aggregateDataTableRecords and ArchiveProcessor::aggregateNumericMetrics methods. Plugins can access a LogAggregator instance and a ArchiveProcessor instance through the Piwik\Plugin\Archiver class.
To learn more about how aggregation is accomplished with Piwik's MySQL backend, read the Piwik database schema guide.
Reports and metrics are persisted using the ArchiveProcessor class. Metrics are inserted using the ArchiveProcessor::insertNumericRecord method. Reports are first serialized using the DataTable::getSerialized method and then inserted using the ArchiveProcessor::insertBlobRecord method:
$archiveProcessor = // ... // insert a numeric value $myFancyMetric = // ... calculate the metric value ... $archiveProcessor->insertNumericRecord('MyPlugin_myFancyMetric', $myFancyMetric); // insert a record (with all of its subtables) $maxRowsInTable = Config::getInstance()->General['datatable_archiving_maximum_rows_standard'];j $dataTable = // ... build by aggregating visits ... $serializedData = $dataTable->getSerialized($maxRowsInTable, $maxRowsInSubtable = $maxRowsInTable, $columnToSortBy = Metrics::INDEX_NB_VISITS); $archiveProcessor->insertBlobRecords('MyPlugin_myFancyReport', $serializedData);
Though data is generated on demand, it would be highly inefficient and create a poor user experience if we relied on it for all users. Any user that receives a significant amount of visits would experience a large delay before being able to view their reports. Piwik solves this problem with a console command that launches the archiving process. The should be executed by cron.
The console command can archive data for every website and for every period except range periods. Reports & metrics for stored segments will also be archived.
The console command will remember when it was last executed and will only initiate the archiving process for a website if there have been visits since that time.
For users that have websites that receive a lot of visits, simply allowing on-demand archiving through the browser will cause undesirable delays. These users can disable browser initiated archiving. Read the user docs for more info.
Reports are served through the API classes defined by individual plugins. API methods access persisted records transform them into presentable reports and serve them through Piwik's Reporting API either in HTTP responses or to PHP code (such as Controller methods).
As stated above, records are not the same as reports. Records are structured primarily to be stored not be read by either humans or other software. Thus API methods cannot simply access persisted data and return it. They must manipulated and made presentable.
DataTable instances, which are used to hold reports, are manipulated by either iterating through rows and manually making changes or through the use of DataTable filters. DataTable filters manipulate DataTable instances in some way. There are several predefined ones that allow you to do common things without having to write a lot of code.
Making a report presentable involves undo-ing the changes that made it more efficient to store. Column names can be changed from integer IDs to string metric names via the ReplaceColumnNames DataTable filter:
Metadata and processed metrics should also be added within API methods. Existing filters (everything in the core/DataTable/Filter directory) can be used to perform most of these tasks.
When a report is returned from an API method it goes through some extra processing based on what query parameters are set for the request. To see exactly what happens to a report, read the relevant section in our Piwik's Reporting API guide.