Assignment 2


                       Customer relation marketing : system and computations


Computational techniques and tools for the task:


  * MS Excel modules (pivoting, visualisation graphs)

Statistica advanced models (Neural networks, association rules). The Statistica software can be
installed from the licenced university resources inet.muni.cz.  (index, software offer,
https://it.muni.cz/sluzby/software/statistica

https://inet.muni.cz/app/soft/licence

Title of software

                 Statistica 13.3

Firm

                 StatSoft

Lokalization

                 EN - Anglická verze

Description

                 Jednouživatelská verze

Validity from/to

                 11.12.2017/31.12.2018
  * The instructions of using Statistica software: http://www.statsoft.com/Textbook
  * In the first stage of the assignment the customer data set  is created from initial data file
by applying Pivot analysis in Excel
  * During the second stage of assignment the created customer data set is inserted to Statistica
software as the data sheet for further computational analysis (Neural neworks, Association rules).
  * Viscovery SoMine (SOM, clustering, cluster research). The software is installed from the
internet, as trial version http://www.viscovery.net/. The same data set created in Excel is used
for analysis.


The task is performed in teams of 2-3 people

.

The initial data file is the same for everyone : CRM_coded_explore.xlsx


Data mining procedures:
 1. Exploration (data cleansing, e.g. dealing with negative values, transformation of the provided
transaction data set by creating new derived variables for building customer data set.
 2. Model building by creating variables and applying suggested computational techniques for their
analysis
 3. Deployment: using the elaborated models for generating insights


Creating CRM variables (Pivot ,MS Excel)


Pivot analysis functions are used for creating variables for analysis, making analytical reports
and for visualisation.

1.      The analysis is perfomed by using all variables computed during the data exploration
(transformation) phase. Among these variables are:

1.1.   Tenure (duration of life cycle in days)

1.2.   Average purchase during one visit

1.3.   Average number of days between visits

1.4.    Standard deviation of the selected variables

1.5.    Indicators of „good“, „bad“ or other types of customers

1.6.   categorical variables for entitling meaningful subsets, frequency of routes, etc.

1.7.    Variable R (Recency) show the number of days since the last visit till the date set for
analysis (you can use the „Today“ or any other date (e.g. end of the year).

1.8.   Variable F indicator is equal to the number of visits of the customer. In our task F is
equal to the number of completed transactions without reimbursements).

1.9.   The M is equal to the total value of purchases during all the history of communication.


Pivot analysis (MS Excel)


2.      Customer analysis by comparing two segments: individuals and firms. Please use various
diagrams for graphical visualisation

3.      RFM analysis (Recency, Frequency, Monetary value). The three indicators of R, F and M are
computed for each customer.

4.      Loyalty and Churn analysis – create variables and define the formulas for their values
suitable for analysis of the level of loyalty (defined by yourself) of the customers. Select two
variables for defining loyalty (frequency of visits or other variables). and to suggets formula for
computing level of risk for churn.

5.      Pareto analysis (testing of the statement if the turnover generated by 20% of best
customers equals to 80% of total turnover of the enteprise).

6.      „Whale curve“ analysis: in order to plot it you have to sort customers by descending order
of their turnover values,  to compute thier cumulative percent values and to plot to Y axis. In X
axis you plot the cumulative percent of the number of customers (e.g. if the enterprise has 10
customers, each of them makes 10% of the enterprise customers, second line will show cumulative of
2 customers which make 20 cumulative percent, etc. Till last line showing 100%). The Whale curve
shows what percent of customers (plotted as cumulative from 0 to 100 % in axis) generate related
part of the total enterprise turnover (plotted in % in Y axis from 0 to 100).

7.      Analysis by popularity among individuals and business travellers, by showing their loyalty
breakdown

8.      Whale curve analysis and interpetation. Goal seek for desired monetary values.


Each task please fulfil in separate Sheet of Excel. In each sheet please provide your calculation
and interpretation of the results.

The outcome –  Excel data file including initial data table, its pivot transformations, comments of
applied analysis and the customer data table for further computational analysis


Neural network  analysis


Copy the prepared customer data from Excel to Statistics

Assign the names in the first line as Variable names

Statistics-Neural Networks menu (SNN)

Select analysis :

The intelligent problem solver can select the best network for you, but we shall analyse the NN
types by selection, therefore we‘ll use „Custome network designer“.

Quick menu:

Problem type- classification

Variables:  define which are input and which are output variables. Variable types: here you have to
define the type of variable as you created them: continuous for historical numeric values,
categorical for the meaningful subsets as you called them (e.g. loalty category 1,2 or 3) .

Compare the ability to classify the data set for three types of networks:

-Multilayer  perceptron

- Radial basis function

-Probabilistic neural network

When you get results, please analyse them by interpretation of numeric and visual reports (Quick
menu)

a)      What is the train, select and test performance ? (similar in each phase, overfit, etc)

b)      What are predictions?

c)      Descriptive statistics (recognize correctly or wrongly)

d)      Sensitivity (importance of variables for classification)

e)      Response surface for interpretation of visualized results

Select the best neural network type. You can see the suummary of all models in Networks/Ensemble

For illustration of your report use Edit-Screen catcher-Capture rectangle

Register this NN model  Select model

In Statistics-Neural Networks menu (SNN)  take „Run existing model“ . Here you can apply the
selected best model for analysing the  other data subset of the same problem.


Association rules


Make sure that the data set has at least four categorical variables. Use the appropriate menu item
„Association rules“. Modify the required level of confidence, correlation and support of the rules.
Define the strongest rules of your data set.


Viscovery Somine


Import the datafile from Excel Spreadsheet (you may need to have this file saved in Excel 2003
format.)

The Self organizing map will separate data in the number of cluster fdefine by you, only in this
case the clusters will mean data group according to the similarity revealed by the SOM algorithm,
not by the output variables created by yourselves.
  * Explore the variable value ranges withing clusters
  * Find separate customers in the clusters and their characteristics.
  * Input the characteristics of „new“ customer record and find out the cluster where it belongs.
  * Analyse clusters by the impact of separate variables.  Are there any clusters which you can
decribe according to articular value ranges of your variables in the data
set