Customer data analysis: combining managerial assumptions and
computations
Computational techniques and tools for the task
• MS Excel module (pivoting, visualisation)
• Statistica advanced models (Neural networks, association rules). The Statistica software can
be installed from the university resources inet.muni.cz. The customer data set, which is
created for Pivot analysis in Excel is inserted to Statistica data sheet.
• Viscovery SoMine (SOM, clustering, cluster research). The software is installed from the
internet, as trial version http://www.viscovery.net/. The same data set created in Excel is
used for analysis.
The task is performed individually.
The data file is the same for everyone : CRM_data_for_analysis.xls
Data mining procedures:
1. Exploration (data cleansing, e.g. dealing with negative values, transformation of the
provided transaction data set by creating new derived variables for building customer data
set.
2. Model building by applying suggested computational techniques
3. Deployment: using the elaborated models for generating insights
Pivot analysis (MS Excel)
Pivot analysis functions are used for creating variables for analysis, making analytical reports and
for visualisation.
Excel-Insert-Pivot Table (Pivot chart)
Analye the pivot table functions: row and column labels (limited number of values), their hierarchy
and grouping. Use right mouse click for selecting field settings (default is Sum).
When you prepare the variables you can copy them to the separate final data sheet.
1. The analysis is perfomed by using all variables computed during the data exploration
(transformation) phase. Please Among these variables are:
1.1. Tenure (duration of life cycle in days)
1.2. Average purchase during one visit
1.3. Average number of days between visits
1.4. Standard deviation of the selected variables
1.5. Indicators of „good“, „bad“ or other types of customers
1.6. categorical variables for entitling meaningful subsets, frequency of routes, etc.
1.7. Variable R (Recency) show the number of days since the last visit till the date set for
analysis (you can use the „Today“ or any other date (e.g. end of the year).
1.8. Variable F indicator is equal to the number of visits of the customer. In our task F is equal
to the number of completed transactions without reimbursements).
1.9. The M is equal to the total value of purchases during all the history of communication.
2. Customer analysis by comparing two segments: individuals and firms. Please use various
diagrams for graphical visualisation
3. RFM analysis (Recency, Frequency, Monetary value). The three indicators of R, F and M are
computed for each customer.
4. Loyalty and Churn analysis – create variables and define the formulas for their values suitable
for analysis of the level of loyalty (defined by yourself) of the customers. Select two variables
for defining loyalty (frequency of visits or other variables). and to suggets formula for
computing level of risk for churn.
5. Pareto analysis (testing of the statement if the turnover generated by 20% of best customers
equals to 80% of total turnover of the enteprise).
6. „Whale curve“ analysis: in order to plot it you have to sort customers by descending order of
their turnover values, to compute thier cumulative percent values and to plot to Y axis. In X
axis you plot the cumulative percent of the number of customers (e.g. if the enterprise has 10
customers, each of them makes 10% of the enterprise customers, second line will show
cumulative of 2 customers which make 20 cumulative percent, etc. Till last line showing 100%).
The Whale curve shows what percent of customers (plotted as cumulative from 0 to 100 % in
axis) generate related part of the total enterprise turnover (plotted in % in Y axis from 0 to 100).
7. Analysis by popularity among individuals and business travellers, by showing their loyalty
breakdown
8. Whale curve analysis and interpetation. Goal seek for desired monetary values.
Each task please fulfil in separate Sheet of Excel. In each sheet please provide your calculation and
interpretation of the results.
Neural network analysis
Copy the data from Excel to Statistics
Assign the names in the first line as Variable names
Statistics-Neural Networks menu (SNN)
Select analysis :
The intelligent problem solver can select the best network for you, but we shall analyse the NN
types by selection, therefore we‘ll use „Custome network designer“.
Quick menu:
Problem type- classification
Variables: define which are input and which are output variables. Variable types: here you have to
define the type of variable as you created them: continuous for historical numeric values,
categorical for the meaningful subsets as you called them (e.g. loalty category 1,2 or 3) .
Compare the ability to classify the data set for three types of networks:
-Multilayer perceptron
- Radial basis function
-Probabilistic neural network
When you get results, please analyse them by interpretation of numeric and visual reports (Quick
menu)
a) What is the train, select and test performance ? (similar in each phase, overfit, etc)
b) What are predictions?
c) Descriptive statistics (recognize correctly or wrongly)
d) Sensitivity (importance of variables for classification)
e) Response surface for interpretation of visualized results
Select the best neural network type. You can see the suummary of all models in Networks/Ensemble
For illustration of your report use Edit-Screen catcher-Capture rectangle
Register this NN model Select model
In Statistics-Neural Networks menu (SNN) take „Run existing model“ . Here you can apply the
selected best model for analysing the other data subset of the same problem.
Viscovery Somine
Import the datafile from Excel Spreadsheet (you may need to have this file saved in Excel 2003
format.)
The Self organizing map will separate data in the number of cluster fdefine by you, only in this case
the clusters will mean data group according to the similarity revealed by the SOM algorithm, not by
the output variables created by yourselves.
• Explore the variable value ranges withing clusters
• Find separate customers in the clusters and their characteristics.
• Input the characteristics of „new“ customer record and find out the cluster where it belongs.
• Analyse clusters by the impact of separate variables. Are there any clusters which you can
decribe according to articular value ranges of your variables in the data set