Data Analytics and Visualization for Marketing Campaign

Direct marketing campaign of a Bank

Data Science in Marketing

         The main goal of data science in marketing is to look for the ideal audience and clients. The use of data science in the field of marketing helps target a desirable market, so the chances of loss are less. Targeting the right customer with data science is done with the help of data analytics. Data analytics tells you the character traits of people, on the basis of those character traits companies and Business can make a decision; the customers are eligible to buy our services and products. Its pre-homework and routine company follows before submitting the proposal to any customer. Consumer potential matters most while looking towards your target. It is not built on assumptions, rather proofs in this form of statistics and visual representation of data provide real insights. Use data analytics in the field of marketing to find answers to several questions. There are no predetermined criteria for analytics; it depends on project-to-project and task-to-task. One thing that is common is a universal rule, it begins with a question?

Data analytics and visualization report.



Scenario.

Data Science solution for saving the cost of a direct marketing campaign by targeting the right client from the previous data of the campaign. Working on this banking data set, data scientists can consider the following scenarios. A bank has decided to save the cost of a direct marketing campaign based on phone calls offering a product to a client. A cost-efficient solution is expected to support the campaign with the help of analytics and visualization. To look for a client profile who is the right customer for bank, that buys the product or not.

Banking Data

 Data provided for the Direct marketing campaign is in the form of CSV(comma separated values), and the total number of the observations supplied is 41188. The dataset is untidy and is not ready for processing. Some of the column data types are objects and their values are strings. The target column name is y in the data set and consists of two values "y" and "n". Y means the client has accepted the offer, and n means no, the client has not accepted the offer.

Background On Data

The data provided was recorded from the previous direct marketing campaign in the past year. The calls that were made were all pre-recorded. No negotiation took place on the phone call. After listening to the details on the call, the customer subscribed to the banking product via an online portal.

Attributes of Banking Data set

Following is the provided attribute Information:
• 1 - age (numeric)

• 2 - job : type of job (categorical: ’admin.’,’blue-collar’,’entrepreneur’,’housemaid’,
’management’,’retired’,’self-employed’,’services’,’student’,’technician’,’unemployed’
,’unknown’)

• 3 - marital : marital status (categorical: ’divorced’,’married’,’single’,’unknown’;
note: ’divorced’ means divorced or widowed)

• 4 - education (categorical: ’basic.4y’,’basic.6y’,’basic.9y’,’high.school’,’illiterate’,
  ’professional.course’,’university.degree’,’unknown’)

• 5 - default: has credit in default? (categorical: ’no’,’yes’,’unknown’)

• 6 - housing: has housing loan? (categorical: ’no’,’yes’,’unknown’)

• 7 - loan: has personal loan? (categorical: ’no’,’yes’,’unknown’) related
       with the last contact of the current campaign:

• 8 - contact: contact communication type (categorical: ’cellular’,’telephone’)

• 9 - month: last contact month of year (categorical: ’jan’, ’feb’, ’mar’,
       ..., ’nov’, ’dec’)

• 10 - day of week: last contact day of the week (categorical: ’mon’,’tue’,’wed’,’thu’,’fri’)

• 11 - duration: last contact duration, in seconds (numeric).
         Important note: this attribute highly affects the output target (e.g., if duration=0 then y=’no’). Yet,          the duration is not known before a call is performed. Also, after the end of the call y is obviously            known. Thus, this input should only be included for benchmark purposes and should be dis carded          if the intention is to have a realistic predictive model. 

other attributes:

• 12 - campaign: number of contacts performed during this campaign
          and for this client (numeric, includes last contact)
• 13 - pdays: number of days that passed by after the client was last contacted from a previous                          campaign (numeric; 999 means client was not previously contacted)       
• 14 - previous: number of contacts performed before this campaign and for this client (numeric)
• 15 - poutcome: outcome of the previous marketing campaign (categorical:         ’failure’,’nonexistent’,’success’) 

• 16 - emp.var.rate: employment variation rate - quarterly indicator (numeric)
• 17 - cons.price.idx: consumer price index - monthly indicator (numeric)
• 18 - cons.conf.idx: consumer confidence index - monthly indicator (numeric)
• 19 - euribor3m: euribor 3 month rate - daily indicator (numeric)
• 20 - nr.employed: number of employees - quarterly indicator (numeric)
          Output variable (desired target):
• 21 - y - has the client subscribed a term deposit? (binary: ’yes’,’no’)


EDA and Visualization

HEAD
Using the head() method of the pandas, we see the first five columns of the banking datasets. The head of the banking dataset provides us the key to show it has some categorical data and numeric data. The column ”y” is our target variable. The ”y” column mainly consists of two variables, yes and no. The application of head() function is among the initial and necessary uses of data analysis since it helps understand the overall structure of the dataset.

INFO

     Using info method on the banking datasets we can see the total number of columns and their types. With the help of pandas info() method we can observe there are total 21 columns and ranging index from 0 to 20 and 41188 number of rows(observations) ranging index from 0 to 4187, there are total five columns of type float f loat(), five columns of type integer int() and lastly eleven columns of type object.Columns of type object were converted to ”0” and ”1” by using replace() method and than using astype(int) function to convert them into the data integer int().The panda’s info function is used to get a concise summary of the data frame. It comes really handily when doing exploratory analysis of the data. To get a quick overview of the dataset, we use the data frame.info() function.

     We will start exploring by looking at the fundamental features first. Here in this scatter plot we will look at the three main features age, duration and campaign. The reason behind this is think of your self as campaign lead and you want find where the potential is. From provided data you have observation from the previous campaign and by analyzing them we can determine what clients do we have to target first.

Why age is first?

       We will further explore the age column since it helps us target the right client more accurately. Exploring the age is also essential since many of the bank customer who has deposited their funds will be differentiated with the help of age. Exploring the age column is based on certain factors for example young people are more likely to buy the product that bank is offering and old people are less likely to buy the product since the bank product provides long term benefits which is more likely to help the young ones, but remember these are assumptions and has to be visualized for the prove.
 
Here we can clearly see majority of the people are between the age of thirty to forty five. For further accurate investigation we will plot the box plot.
This box plot shows the median of age almost thirty to fifty. We can clearly see the result from the provided data sets  
The co-relation graph of people who say yes and no, with respect to age and job. This co-relation graph shows almost in every age group there are equal amount of yes and no and in every job group there are equal amount of yes and no.


Martial Status 

Majority customer's martial status is married. Second highest rate is of people who are single and third category of people who are divorced and rest of them are unknown.

Calls

Distribution of call duration explains most of the call made tool almost twenty five. seconds in total to all the customers. The calls less than 25 can be assumed as no answer or customers were not interested.


The co-relation yes and no between duration of calls made and number of calls made. This correlation describes when number of calls are are five or less. than five the people are more likely to buy the banking product via direct marketing campaign

Previous Outcomes

What were the previous out come of the campaign? this is essential since all the data provided is form the previous campaign. We will analyze the data of columns which are essential for marketing and to get get insight for the right customer. 
The distribution of customer who made the deposit and with respect to job and education. The distribution proves people having university degree bought the product more and are more likely to.

CONCLUSION

Customers having ages thirty to forty five who are married, and are currently working with a university degree should be the main target for the product that the bank has to offer. The direct marketing campaign should target more clients with the provided profile or at least start from these clients so it can save them time and the cost of the direct marketing campaign.


Author: Muhammad Saad 










Post a Comment

Previous Post Next Post