The Impact of Covid 19 on Mental Health¶

Utilising Survey Data Collection, Data Visualisation, Univariate and Multivariate Analysis - Based on 3 Research Questions¶

HealthyData_Social_Media_Kit_Twitter.png

Introduction and Objective¶

The World Health Organization (WHO) conceptualizes mental health as a “state of well-being in which the individual realizes his or her own abilities, can cope with the normal stresses of life, can work productively and fruitfully, and is able to make a contribution to his or her community”.

However, the resources allocated by countries to tackle this burden are insufficient, are inequitably distributed, and, at times, inefficiently used. Together, this has led to a treatment gap that, in many countries, is more than 70%. The stigma, social exclusion, and discrimination that occur around people with mental disorders compound the situation.

The WHO (World Health Organisation) have highlighted the urgent need to transform mental health and mental health care.   Even before the COVID-19 pandemic, just a small fraction of people in need had access to effective, affordable and quality mental health care.

Hence, our objective was to research if there’s a demand in mental health cases and/ or, if the general healthcare system was coping during the Covid 19 crisis.  The outcome would enable us to make high level decisions on Smart healthcare, that could potentially be used to deal with mental health cases i.e. tele therapy with a psychotherapist or even a smart mental health bot. Naturally, reducing the backlog of general healthcare services [1 and 2]

Thus, we conducted a survey-based investigation to address 3 specific research questions, as can be seen below.

Role and Background¶

Sample and Population¶

The samples were collected randomly (probability sampling), hence no specific procedure was followed. This allows us to make strong statistical inferences about the population as a whole.

It was important to obtain a global perspective, to how people coped all over the world. A designated Slack forum provided this opportunity to collect global data. However, our first attempt highlighted how slow the response rate was. More critically, the average amount of responses collected by participants was low. We required a larger sample, that more accurately represented the characteristics of the corresponding population. Hence, as time was of the essence, we decided on a paid option (Pollfish) - where we collected a larger sample, at a faster rate.

Explicitly Stated Research Questions¶

Research Q1: How have the general population coped with the pandemic, as a whole? Survey Questions 4,5,6,7 and 8.

Research Q2: What coping mechanisms have the general population adopted during the lockdown? – Survey Questions 9, 10 and 11.

Research Q3: Is the healthcare system in general, adequate to deal with people with mental health issues, especially during adverse advents such as a lock down? – Survey Questions 12 and 13.

Scope¶

  • Survey Question 1: As '65+' produced 0 responses, this option was removed from the analysis
  • Survey Question 2: As 'Prefer not to say' produced 0 responses, this option was removed fromthe analysis.
  • Survey Questions 7.1 and 7.3: As 'Never' produced 0 repsonses, this option was removed from the analysis.
  • Survey Questions 8.1, 8.2 amnd 8.3: As 'Very Satisfied' produced 0 responses, this option was removed from the analysis.
  • All survey questions in general: All 'Other' options produced 0 responses, hence this option was removed from the analysis.

We didn't use the below key variables (surplus to requirements).

  • Survey Question 8.2: 'How satisfied were you with daily life activities?'
  • Survey Question 8.3: 'How satisifed were you with new activities?'

Domain Concepts¶

This survey has generated quanitative statistics (through the use of encoding, for key variables) to assess a qualitative concept i.e. answer relevant research questions.

Survey Design¶

The survey is structured into two question groups, encompassing 3 pages:

  • The first page consists of the 'Introduction and Privacy Notice'
  • The second page consists of the 'Socio-economic and Demographic Information' (Questions 1-3 inclusive)
  • The third page consists of the main survey questions (Questions 4-13 inclusive)

Three Socio-economic and Demographic Information questions (Q1, Q2 and Q3), One dichotomous question (Q4), three checkbox (multiple choice) questions (Q5, Q10, Q11), three likert scale questions (Q6,Q7 and Q8). Questions 9, 12 and 13 are general closed questions.

Operational Definitions¶

The key variables or measurables are the likert scale questions. This is where we'll assign a measurable scale to these categorical questions i.e numerical encoding in Python. We gained inspiration for these key variables, from reviewing research papers on mental health during the Covid crisis [2, 3, 4]

Data¶

Justification of Good Data¶

The survey was created in Jisc online surveys software and the survey link was used to collect responses in the paid survey platform. The final responses were acquired from the Jisc online surveys software, after converting to comma separated values format (50kb/ 102 responses). Google Forms can perform similar tasks.

A well written survey question will mean the same thing to all respondents. It was important to avoid bias and loaded words. Measuring behaviour with symmetrical and balanced likert scales is key.  They should contain an equal number of positive and negative responses, within the distance between each item being the same.

This way, we'll obtain quality data that are relevant to their related research questions.

Pre-Processing¶

In [2]:
# importing libraries
import pandas as pd, numpy as np
import statistics


# to ignore any warning messages.
import warnings
warnings.filterwarnings('ignore')

# for visualization
import matplotlib
import matplotlib.pyplot as plt, seaborn as sns

# for unit testing
from unittest.mock import patch, Mock
import unittest

import re
import nltk
nltk.download('stopwords')
import matplotlib.pyplot as plt
pd.options.display.max_colwidth = 200
%matplotlib inline
import os
import textwrap

from platform import python_version
[nltk_data] Downloading package stopwords to
[nltk_data]     /Users/stephenmalcolm/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
In [3]:
# Version information of libraries
print('Pandas Version: ', pd.__version__)
print('Numpy Version: ', np.__version__)
print('Matplotlib Version: ', matplotlib.__version__)
print('Seaborn Version: ', sns.__version__)
print('Python Version: ', python_version())
Pandas Version:  1.4.4
Numpy Version:  1.24.4
Matplotlib Version:  3.7.1
Seaborn Version:  0.12.2
Python Version:  3.8.16
In [4]:
#first, check we're using the right file type/encoder i.e. it should be UTF-8
#this ensures, it's relatively clean, and can be read and organised (as opposed to some other unusable formats)
import pandas as pd
import csv
data = open('mental-health-survey.csv', 'r')
data
Out[4]:
<_io.TextIOWrapper name='mental-health-survey.csv' mode='r' encoding='UTF-8'>
In [5]:
#we have the correct file encoder, as stated in the above output
#now, import 

import csv
df=pd.read_csv('mental-health-survey.csv')

df.head()._append(df.tail())
Out[5]:
Unique Response Number 1. To what age group do you belong? 2. Which gender do you most strongly identify with? 3. What is your region of residence? 4. Has the covid-19 lockdown affected your state of mind in any way? 5. Do you share a home with any of the following? 5.a. If you selected Other, please specify: 6. If you compare the pre-COVID-19 lockdown and the actual COVID-19 lockdown, how has: 6.1. Your amount of communication with family and friends changed? 6.2. Your financial status changed? ... 8.3. How satisfied were you with new activities? 9. How many hours per day are/ were you able to spend outside, on average during the lockdown? 10. Select if you have had any of these psychological reactions towards the covid-19 pandemic? 10.a. If you selected Other, please specify: 11. During the Covid 19 lockdown, which of the following helped you feel better? 11.a. If you selected Other, please specify: 12. Have you been tempted to access support from social groups during the covid-19 pandemic? 12.a. If you selected Other, please specify: 13. Do you feel that the Covid 19 pandemic has affected your access to healthcare for other conditions? 13.a. If you selected Other, please specify:
0 753624-753615-79015877 35-44 Male Africa Yes Your partner,Other Family Memebers NaN NaN More Than Usual Less Than Usual ... Satisfied 2-3 hours per day Difficulty concentrating and making decisions,Difficulty sleeping or nightmares,Physical reactions, such as headaches, body pains, stomach problems and skin rashes,Increased use of alcohol, tobacc... NaN Doing exercise,Watching/ Reading the news,Talking to friends/ family,Doing remote work or study NaN I don't think I need social support NaN I didn't need access to healthcare during Covid 19 NaN
1 753624-753615-79019215 35-44 Male Europe Yes No one, I live alone NaN NaN More Than Usual Same as Usual ... Unsatisfied 0-1 hours per day Changes in appetite, energy and activity levels,Difficulty concentrating and making decisions NaN Doing exercise,Watching TV i.e. not the news,Doing remote work or study NaN I don't think I need social support NaN I didn't need access to healthcare during Covid 19 NaN
2 753624-753615-79015968 35-44 Male Asia Yes Your partner NaN NaN More Than Usual More Than Usual ... Neutral 0-1 hours per day Changes in appetite, energy and activity levels,Difficulty concentrating and making decisions,Worsening of chronic health problems,Increased use of alcohol, tobacco or other drugs NaN Doing exercise,Watching TV i.e. not the news,Doing housework,Doing remote work or study NaN No - I became unable to use the social support groups I was using before the pandemic NaN I couldn't find over the counter medication in pharmacies or shops NaN
3 753624-753615-79020017 35-44 Male Europe Yes Your partner,Your child/ children,Other Family Memebers NaN NaN More Than Usual Less Than Usual ... Neutral 0-1 hours per day Changes in appetite, energy and activity levels,Difficulty sleeping or nightmares,Worsening of chronic health problems,Increased use of alcohol, tobacco or other drugs NaN Doing exercise,Watching/ Reading the news,Doing remote work or study NaN No - I think I need social support but cannot access them to begin with NaN I needed a GP appointment and couldn't get one at all NaN
4 753624-753615-79020109 Under 18 Male Asia Yes Other Family Memebers NaN NaN More Than Usual Same as Usual ... Unsatisfied 0-1 hours per day Changes in appetite, energy and activity levels,Difficulty concentrating and making decisions,Physical reactions, such as headaches, body pains, stomach problems and skin rashes,Worsening of chron... NaN Doing exercise,Watching TV i.e. not the news,Doing housework,Playing music NaN No - I think I need social support but cannot access them to begin with NaN I didn't need access to healthcare during Covid 19 NaN
97 753624-753615-79060123 Under 18 Male Africa Yes Other Family Memebers,One or multiple housemates NaN NaN Same as Usual Less Than Usual ... Unsatisfied 0-1 hours per day Difficulty concentrating and making decisions NaN Watching/ Reading the news,Doing housework,Talking to friends/ family,Doing remote work or study NaN No - I think I need social support but cannot access them to begin with NaN I didn't need access to healthcare during Covid 19 NaN
98 753624-753615-79060180 25-34 Female Africa No No one, I live alone NaN NaN More Than Usual More Than Usual ... Neutral 0-1 hours per day Increased use of alcohol, tobacco or other drugs NaN Doing exercise,Watching/ Reading the news,Doing housework,Playing music NaN No - I became unable to use the social support groups I was using before the pandemic NaN I didn't need access to healthcare during Covid 19 NaN
99 753624-753615-79060308 25-34 Male Europe Yes Other Family Memebers NaN NaN Same as Usual Same as Usual ... Neutral 0-1 hours per day None of the above NaN Watching TV i.e. not the news,Doing housework,Talking to friends/ family,Doing remote work or study NaN I don't think I need social support NaN Planned procedures that I required, were cancelled NaN
100 753624-753615-79060402 35-44 Female South America Yes Your partner,Your child/ children,Other Family Memebers,One or multiple housemates NaN NaN More Than Usual Less Than Usual ... Satisfied 0-1 hours per day Difficulty concentrating and making decisions,Difficulty sleeping or nightmares,None of the above NaN Watching TV i.e. not the news,Watching/ Reading the news,Doing remote work or study,Playing music NaN No - I think I need social support but cannot access them to begin with NaN Planned procedures that I required, were cancelled NaN
101 753624-753615-79060522 25-34 Male Asia Yes Your partner,Your child/ children,Other Family Memebers NaN NaN Less Than Usual Less Than Usual ... Neutral 0-1 hours per day Difficulty concentrating and making decisions NaN Talking to friends/ family,Doing remote work or study,Playing music NaN No - I think I need social support but cannot access them to begin with NaN My regular hospital-based treatment was disrupted NaN

10 rows × 27 columns

In [6]:
#after reviewing the analysis from the survey column, we should have 102 respondants (rows) and 10 questions 
#(columns) meaning, that all of our survey data is now stored in that DataFrame and ready for analysis
#those extra columns (see output below), highlights we have to drop several columns
#let's address further into the preprocessing exercise
In [7]:
#Let's check the number of rows and columns
df.shape
Out[7]:
(102, 27)
In [8]:
#Let's check for missing values (Nan) and provide a copy or list of the original questions
#From above output, we have 27 columns
df.isnull().sum()
Out[8]:
Unique Response Number                                                                                       0
1. To what age group do you belong?                                                                          0
2. Which gender do you most strongly identify with?                                                          0
3. What is your region of residence?                                                                         0
4. Has the covid-19 lockdown affected your state of mind in any way?                                         0
5. Do you share a home with any of the following?                                                            0
5.a. If you selected Other, please specify:                                                                102
6. If you compare the pre-COVID-19 lockdown and the actual COVID-19 lockdown, how has:                     102
6.1. Your amount of communication with family and friends changed?                                           0
6.2. Your financial status changed?                                                                          0
7. During the Covid 19 lockdown, please state the effects and restraints on your wellbeing:                102
7.1. How often did you feel unhappy/ depressed?                                                              0
7.2. How often did you feel anxious/ a loss of confidence?                                                   0
7.3. How often did you feel under stress?                                                                    0
8. During the Covid 19 lockdown, please state the effects and restraints on your daily life:               102
8.1. How satisfied were you with your study and/ or work?                                                    0
8.2. How satisfied were you with daily life activities?                                                      0
8.3. How satisfied were you with new activities?                                                             0
9. How many hours per day are/ were you able to spend outside, on average during the lockdown?               0
10. Select if you have had any of these psychological reactions towards the covid-19 pandemic?               0
10.a. If you selected Other, please specify:                                                               102
11. During the Covid 19 lockdown, which of the following helped you feel better?                             0
11.a. If you selected Other, please specify:                                                               102
12. Have you been tempted to access support from social groups during the covid-19 pandemic?                 0
12.a. If you selected Other, please specify:                                                               102
13. Do you feel that the Covid 19 pandemic has affected your access to healthcare for other conditions?      0
13.a. If you selected Other, please specify:                                                               102
dtype: int64
In [9]:
#lets create a data type function for displaying null values and data types
def finding_null_value(df):

    total_null = df.isnull().sum()           

    total_percent = (df.isnull().sum()/df.isnull().count()*100) 
    
    
    new_var = pd.concat([total_null, total_percent], axis=1, keys=['Total_null', 'Total_percent(%)'])


    types_array = []
    for column in df.columns:
        dtype = str(df[column].dtype)
        types_array.append(dtype)
    new_var['Types'] = types_array

    return(np.transpose(new_var))
In [10]:
#The data type function displays:
# 1. Total null values 
# 2. Total percentage 
# 3. Display types of every feature (don't need to use Dtypes command, however, we'll still demonstrate below) 
finding_null_value(df)
Out[10]:
Unique Response Number 1. To what age group do you belong? 2. Which gender do you most strongly identify with? 3. What is your region of residence? 4. Has the covid-19 lockdown affected your state of mind in any way? 5. Do you share a home with any of the following? 5.a. If you selected Other, please specify: 6. If you compare the pre-COVID-19 lockdown and the actual COVID-19 lockdown, how has: 6.1. Your amount of communication with family and friends changed? 6.2. Your financial status changed? ... 8.3. How satisfied were you with new activities? 9. How many hours per day are/ were you able to spend outside, on average during the lockdown? 10. Select if you have had any of these psychological reactions towards the covid-19 pandemic? 10.a. If you selected Other, please specify: 11. During the Covid 19 lockdown, which of the following helped you feel better? 11.a. If you selected Other, please specify: 12. Have you been tempted to access support from social groups during the covid-19 pandemic? 12.a. If you selected Other, please specify: 13. Do you feel that the Covid 19 pandemic has affected your access to healthcare for other conditions? 13.a. If you selected Other, please specify:
Total_null 0 0 0 0 0 0 102 102 0 0 ... 0 0 0 102 0 102 0 102 0 102
Total_percent(%) 0.0 0.0 0.0 0.0 0.0 0.0 100.0 100.0 0.0 0.0 ... 0.0 0.0 0.0 100.0 0.0 100.0 0.0 100.0 0.0 100.0
Types object object object object object object float64 float64 object object ... object object object float64 object float64 object float64 object float64

3 rows × 27 columns

In [11]:
#Need to rename headers, to make it tidier and coherent

df.columns = ['URN','Age', 'Gender', 'Region', 'SOM_Affected', 'Home Sharing', 'Q5a', 'Q6', 
              'Impact on Communication', 'Impact on Financial Status', 'Q7', 'Unhappy_or_Depressed',
              'Anxious or Low Confidence', 'Stressed', 'Q8', 'Study or Work Satisfaction', 'Routine Satisfaction',
              'New Activities Satisfaction', 'Hours Spent Outside', 'Psychological_Reactions', 'Q10a',
              'Supported_Activities', 'Q11a', 'Social_Support', 'Q12a', 'Healthcare_Access', 'Q13a'] 
df.head()
Out[11]:
URN Age Gender Region SOM_Affected Home Sharing Q5a Q6 Impact on Communication Impact on Financial Status ... New Activities Satisfaction Hours Spent Outside Psychological_Reactions Q10a Supported_Activities Q11a Social_Support Q12a Healthcare_Access Q13a
0 753624-753615-79015877 35-44 Male Africa Yes Your partner,Other Family Memebers NaN NaN More Than Usual Less Than Usual ... Satisfied 2-3 hours per day Difficulty concentrating and making decisions,Difficulty sleeping or nightmares,Physical reactions, such as headaches, body pains, stomach problems and skin rashes,Increased use of alcohol, tobacc... NaN Doing exercise,Watching/ Reading the news,Talking to friends/ family,Doing remote work or study NaN I don't think I need social support NaN I didn't need access to healthcare during Covid 19 NaN
1 753624-753615-79019215 35-44 Male Europe Yes No one, I live alone NaN NaN More Than Usual Same as Usual ... Unsatisfied 0-1 hours per day Changes in appetite, energy and activity levels,Difficulty concentrating and making decisions NaN Doing exercise,Watching TV i.e. not the news,Doing remote work or study NaN I don't think I need social support NaN I didn't need access to healthcare during Covid 19 NaN
2 753624-753615-79015968 35-44 Male Asia Yes Your partner NaN NaN More Than Usual More Than Usual ... Neutral 0-1 hours per day Changes in appetite, energy and activity levels,Difficulty concentrating and making decisions,Worsening of chronic health problems,Increased use of alcohol, tobacco or other drugs NaN Doing exercise,Watching TV i.e. not the news,Doing housework,Doing remote work or study NaN No - I became unable to use the social support groups I was using before the pandemic NaN I couldn't find over the counter medication in pharmacies or shops NaN
3 753624-753615-79020017 35-44 Male Europe Yes Your partner,Your child/ children,Other Family Memebers NaN NaN More Than Usual Less Than Usual ... Neutral 0-1 hours per day Changes in appetite, energy and activity levels,Difficulty sleeping or nightmares,Worsening of chronic health problems,Increased use of alcohol, tobacco or other drugs NaN Doing exercise,Watching/ Reading the news,Doing remote work or study NaN No - I think I need social support but cannot access them to begin with NaN I needed a GP appointment and couldn't get one at all NaN
4 753624-753615-79020109 Under 18 Male Asia Yes Other Family Memebers NaN NaN More Than Usual Same as Usual ... Unsatisfied 0-1 hours per day Changes in appetite, energy and activity levels,Difficulty concentrating and making decisions,Physical reactions, such as headaches, body pains, stomach problems and skin rashes,Worsening of chron... NaN Doing exercise,Watching TV i.e. not the news,Doing housework,Playing music NaN No - I think I need social support but cannot access them to begin with NaN I didn't need access to healthcare during Covid 19 NaN

5 rows × 27 columns

In [12]:
#The respondents didn't bother populating any of the 'Other' options. Hence, we have missing values here
#Questions 6, 7 and 8 are general overview questions, that don't require an answer. 
#Hence, these are Q5a, Q6, Q7, Q8, Q10a, Q11a, Q12a and Q13a
#Drop these columns

df = df.dropna(axis=1)
In [13]:
#We should delete the unique response number/ URN column
#it serves no purpose

df.drop('URN', axis=1, inplace=True)
df
Out[13]:
Age Gender Region SOM_Affected Home Sharing Impact on Communication Impact on Financial Status Unhappy_or_Depressed Anxious or Low Confidence Stressed Study or Work Satisfaction Routine Satisfaction New Activities Satisfaction Hours Spent Outside Psychological_Reactions Supported_Activities Social_Support Healthcare_Access
0 35-44 Male Africa Yes Your partner,Other Family Memebers More Than Usual Less Than Usual Sometimes Often Sometimes Unsatisfied Neutral Satisfied 2-3 hours per day Difficulty concentrating and making decisions,Difficulty sleeping or nightmares,Physical reactions, such as headaches, body pains, stomach problems and skin rashes,Increased use of alcohol, tobacc... Doing exercise,Watching/ Reading the news,Talking to friends/ family,Doing remote work or study I don't think I need social support I didn't need access to healthcare during Covid 19
1 35-44 Male Europe Yes No one, I live alone More Than Usual Same as Usual Often Sometimes Sometimes Satisfied Unsatisfied Unsatisfied 0-1 hours per day Changes in appetite, energy and activity levels,Difficulty concentrating and making decisions Doing exercise,Watching TV i.e. not the news,Doing remote work or study I don't think I need social support I didn't need access to healthcare during Covid 19
2 35-44 Male Asia Yes Your partner More Than Usual More Than Usual Sometimes Often Sometimes Unsatisfied Neutral Neutral 0-1 hours per day Changes in appetite, energy and activity levels,Difficulty concentrating and making decisions,Worsening of chronic health problems,Increased use of alcohol, tobacco or other drugs Doing exercise,Watching TV i.e. not the news,Doing housework,Doing remote work or study No - I became unable to use the social support groups I was using before the pandemic I couldn't find over the counter medication in pharmacies or shops
3 35-44 Male Europe Yes Your partner,Your child/ children,Other Family Memebers More Than Usual Less Than Usual Sometimes Sometimes Sometimes Unsatisfied Unsatisfied Neutral 0-1 hours per day Changes in appetite, energy and activity levels,Difficulty sleeping or nightmares,Worsening of chronic health problems,Increased use of alcohol, tobacco or other drugs Doing exercise,Watching/ Reading the news,Doing remote work or study No - I think I need social support but cannot access them to begin with I needed a GP appointment and couldn't get one at all
4 Under 18 Male Asia Yes Other Family Memebers More Than Usual Same as Usual Often Often Often Satisfied Unsatisfied Unsatisfied 0-1 hours per day Changes in appetite, energy and activity levels,Difficulty concentrating and making decisions,Physical reactions, such as headaches, body pains, stomach problems and skin rashes,Worsening of chron... Doing exercise,Watching TV i.e. not the news,Doing housework,Playing music No - I think I need social support but cannot access them to begin with I didn't need access to healthcare during Covid 19
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
97 Under 18 Male Africa Yes Other Family Memebers,One or multiple housemates Same as Usual Less Than Usual Often Often Sometimes Neutral Unsatisfied Unsatisfied 0-1 hours per day Difficulty concentrating and making decisions Watching/ Reading the news,Doing housework,Talking to friends/ family,Doing remote work or study No - I think I need social support but cannot access them to begin with I didn't need access to healthcare during Covid 19
98 25-34 Female Africa No No one, I live alone More Than Usual More Than Usual Seldom Sometimes Seldom Neutral Neutral Neutral 0-1 hours per day Increased use of alcohol, tobacco or other drugs Doing exercise,Watching/ Reading the news,Doing housework,Playing music No - I became unable to use the social support groups I was using before the pandemic I didn't need access to healthcare during Covid 19
99 25-34 Male Europe Yes Other Family Memebers Same as Usual Same as Usual Sometimes Often Often Neutral Neutral Neutral 0-1 hours per day None of the above Watching TV i.e. not the news,Doing housework,Talking to friends/ family,Doing remote work or study I don't think I need social support Planned procedures that I required, were cancelled
100 35-44 Female South America Yes Your partner,Your child/ children,Other Family Memebers,One or multiple housemates More Than Usual Less Than Usual Seldom Sometimes Sometimes Satisfied Neutral Satisfied 0-1 hours per day Difficulty concentrating and making decisions,Difficulty sleeping or nightmares,None of the above Watching TV i.e. not the news,Watching/ Reading the news,Doing remote work or study,Playing music No - I think I need social support but cannot access them to begin with Planned procedures that I required, were cancelled
101 25-34 Male Asia Yes Your partner,Your child/ children,Other Family Memebers Less Than Usual Less Than Usual Seldom Sometimes Often Neutral Satisfied Neutral 0-1 hours per day Difficulty concentrating and making decisions Talking to friends/ family,Doing remote work or study,Playing music No - I think I need social support but cannot access them to begin with My regular hospital-based treatment was disrupted

102 rows × 18 columns

In [14]:
#Very quick and dirty 'summary of statistics'
#Statistical summary for categorical or string variables will show “count”, “unique”, “top”, and “freq”.

df.describe(include='all').T
Out[14]:
count unique top freq
Age 102 6 35-44 35
Gender 102 2 Male 56
Region 102 6 Europe 26
SOM_Affected 102 2 Yes 73
Home Sharing 102 15 Other Family Memebers 26
Impact on Communication 102 3 More Than Usual 35
Impact on Financial Status 102 3 Less Than Usual 48
Unhappy_or_Depressed 102 4 Sometimes 44
Anxious or Low Confidence 102 5 Often 43
Stressed 102 4 Sometimes 48
Study or Work Satisfaction 102 4 Neutral 39
Routine Satisfaction 102 4 Neutral 45
New Activities Satisfaction 102 4 Unsatisfied 41
Hours Spent Outside 102 5 4-5 hours per day 38
Psychological_Reactions 102 33 None of the above 23
Supported_Activities 102 54 I didn't need help 14
Social_Support 102 5 I don't think I need social support 57
Healthcare_Access 102 7 I didn't need access to healthcare during Covid 19 54
In [15]:
#Preparing a separate dataframe for later

df_new=df.copy()

Set Up Categories¶

In [16]:
#Present types of categories here or use function above

df.dtypes
Out[16]:
Age                            object
Gender                         object
Region                         object
SOM_Affected                   object
Home Sharing                   object
Impact on Communication        object
Impact on Financial Status     object
Unhappy_or_Depressed           object
Anxious or Low Confidence      object
Stressed                       object
Study or Work Satisfaction     object
Routine Satisfaction           object
New Activities Satisfaction    object
Hours Spent Outside            object
Psychological_Reactions        object
Supported_Activities           object
Social_Support                 object
Healthcare_Access              object
dtype: object
In [17]:
#Set the nominal (non-ordered categorical) data types from object to category type

df['Gender']=df['Gender'].astype('category')
df['Region']=df['Region'].astype('category')
df.dtypes
Out[17]:
Age                              object
Gender                         category
Region                         category
SOM_Affected                     object
Home Sharing                     object
Impact on Communication          object
Impact on Financial Status       object
Unhappy_or_Depressed             object
Anxious or Low Confidence        object
Stressed                         object
Study or Work Satisfaction       object
Routine Satisfaction             object
New Activities Satisfaction      object
Hours Spent Outside              object
Psychological_Reactions          object
Supported_Activities             object
Social_Support                   object
Healthcare_Access                object
dtype: object
In [18]:
#Verify dtype
#This gives us an extra step to check that there's no mistakes or no unexpected values in our data
df['Gender'].dtype
Out[18]:
CategoricalDtype(categories=['Female', 'Male'], ordered=False)
In [19]:
#Perform the same for the Region column
df['Region'].dtype
Out[19]:
CategoricalDtype(categories=['Africa', 'Asia', 'Europe', 'North America', 'Oceania',
                  'South America'],
, ordered=False)
In [20]:
#Ordinal (ordinal categorical)/ Questions 6.1, 6.2 and 7.1, 7.2, 7.3 and 8.1, 8.2, 8.3
#Using a likert scale

#Sometimes when setting categorical variables, it is also necessary to specify all of the categories explicitly.
#This is very often the case for ordinal variables, as well as specifying all of the categories that are permitted 
#in a particular ordinal variable, we also need to specify what ordering they follow. This is the case for the 
#key variables in this dataset, which relates to 8 different columns/questions.

#Q6.1 - Set up new variable

friends_family = ['same as usual', 'less than usual', 'more than usual']

#Set correct data types
df['Impact on Communication'] = df['Impact on Communication'].astype(pd.CategoricalDtype(ordered=True, categories=friends_family))

#Inspect dtype
df['Impact on Communication'].dtype
Out[20]:
CategoricalDtype(categories=['same as usual', 'less than usual', 'more than usual'], ordered=True)
In [21]:
#Ordinal (ordinal categorical)
#Q6.2 - Set up new variable

money = ['same as usual', 'less than usual', 'more than usual']

#Set correct data types
df['Impact on Financial Status'] = df['Impact on Financial Status'].astype(pd.CategoricalDtype(ordered=True, categories=money))

#Inspect dtype
df['Impact on Communication'].dtype
Out[21]:
CategoricalDtype(categories=['same as usual', 'less than usual', 'more than usual'], ordered=True)
In [22]:
#Ordinal (ordinal categorical)
#Q7.1 - Set up new variable

unhappy = ['almost always', 'often', 'sometimes', 'seldom', 'never']

#Set correct data types
df['Unhappy_or_Depressed'] = df['Unhappy_or_Depressed'].astype(pd.CategoricalDtype(ordered=True, categories=unhappy))

#Inspect dtype
df['Unhappy_or_Depressed'].dtype
Out[22]:
CategoricalDtype(categories=['almost always', 'often', 'sometimes', 'seldom', 'never'], ordered=True)
In [23]:
#Ordinal (ordinal categorical)
#Q7.2 - Set up new variable

anxious = ['almost always', 'often', 'sometimes', 'seldom', 'never']

#Set correct data types
df['Anxious or Low Confidence'] = df['Anxious or Low Confidence'].astype(pd.CategoricalDtype(ordered=True, categories=anxious))

#Inspect dtype
df['Anxious or Low Confidence'].dtype
Out[23]:
CategoricalDtype(categories=['almost always', 'often', 'sometimes', 'seldom', 'never'], ordered=True)
In [24]:
#Ordinal (ordinal categorical)
#Q7.3 - Set up new variable

stress = ['almost always', 'often', 'sometimes', 'seldom', 'never']

#Set correct data types
df['Stressed'] = df['Unhappy_or_Depressed'].astype(pd.CategoricalDtype(ordered=True, categories=stress))

#Inspect dtype
df['Unhappy_or_Depressed'].dtype
Out[24]:
CategoricalDtype(categories=['almost always', 'often', 'sometimes', 'seldom', 'never'], ordered=True)
In [25]:
#Ordinal (ordinal categorical)
#Q8.1 - Set up new variable

study_work = ['very unsatisfied', 'unsatisfied', 'neutral', 'satisfied', 'very satisfied']

#Set correct data types
df['Routine Satisfaction'] = df['Routine Satisfaction'].astype(pd.CategoricalDtype(ordered=True, categories=study_work))

#Inspect dtype
df['Routine Satisfaction'].dtype
Out[25]:
CategoricalDtype(categories=['very unsatisfied', 'unsatisfied', 'neutral', 'satisfied',
                  'very satisfied'],
, ordered=True)
In [26]:
#Ordinal (ordinal categorical)
#Q8.2 - Set up new variable

routine = ['very unsatisfied', 'unsatisfied', 'neutral', 'satisfied', 'very satisfied']

#Set correct data types
df['Study or Work Satisfaction'] = df['Study or Work Satisfaction'].astype(pd.CategoricalDtype(ordered=True, categories=routine))

#Inspect dtype
df['Study or Work Satisfaction'].dtype
Out[26]:
CategoricalDtype(categories=['very unsatisfied', 'unsatisfied', 'neutral', 'satisfied',
                  'very satisfied'],
, ordered=True)
In [27]:
#Ordinal (ordinal categorical)
#Q8.3 (Likert)- Set up new variable

new_activity = ['very unsatisfied', 'unsatisfied', 'neutral', 'satisfied', 'very satisfied']

#Set correct data types
df['New Activities Satisfaction'] = df['New Activities Satisfaction'].astype(pd.CategoricalDtype(ordered=True, categories=new_activity))

#Inspect dtype
df['New Activities Satisfaction'].dtype
Out[27]:
CategoricalDtype(categories=['very unsatisfied', 'unsatisfied', 'neutral', 'satisfied',
                  'very satisfied'],
, ordered=True)
In [28]:
#Ordinal (ordinal categorical)
#Q5 (Checkbox) - Set up new variable
#This particular checkbox must be ordered, or the barchart wont be interpreted correctly

outside = ['0-1 hours per day', '2-3 hours per day', '4-5 hours per day', '6-8 hours per day', 'More than 8 hours per day']

#Set correct data types
df['Hours Spent Outside'] = df['Hours Spent Outside'].astype(pd.CategoricalDtype(ordered=True, categories=outside))

#Inspect dtype
df['Hours Spent Outside'].dtype
Out[28]:
CategoricalDtype(categories=['0-1 hours per day', '2-3 hours per day',
                  '4-5 hours per day', '6-8 hours per day',
                  'More than 8 hours per day'],
, ordered=True)
In [29]:
#Ordinal (ordinal categorical)
#Q1 - Set up new variable

age = ['Under 18', '18-24', '25-34', '35-44', '45-54', '55-64', '65+']

#Set correct data types
df['Age'] = df['Age'].astype(pd.CategoricalDtype(ordered=True, categories=age))

#Inspect dtype
df['Age'].dtype
Out[29]:
CategoricalDtype(categories=['Under 18', '18-24', '25-34', '35-44', '45-54', '55-64',
                  '65+'],
, ordered=True)
In [30]:
#We can see below, the specified objects that were once object, have now been transformed into category

df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 102 entries, 0 to 101
Data columns (total 18 columns):
 #   Column                       Non-Null Count  Dtype   
---  ------                       --------------  -----   
 0   Age                          102 non-null    category
 1   Gender                       102 non-null    category
 2   Region                       102 non-null    category
 3   SOM_Affected                 102 non-null    object  
 4   Home Sharing                 102 non-null    object  
 5   Impact on Communication      0 non-null      category
 6   Impact on Financial Status   0 non-null      category
 7   Unhappy_or_Depressed         0 non-null      category
 8   Anxious or Low Confidence    0 non-null      category
 9   Stressed                     0 non-null      category
 10  Study or Work Satisfaction   0 non-null      category
 11  Routine Satisfaction         0 non-null      category
 12  New Activities Satisfaction  0 non-null      category
 13  Hours Spent Outside          102 non-null    category
 14  Psychological_Reactions      102 non-null    object  
 15  Supported_Activities         102 non-null    object  
 16  Social_Support               102 non-null    object  
 17  Healthcare_Access            102 non-null    object  
dtypes: category(12), object(6)
memory usage: 8.5+ KB

Summary of Data Overview and Pre-Processing¶

Upon direct examination of the data types (dtypes) pertaining to the aforementioned columns, a discernible transformation is apparent. The columns or variables that necessitated prior modifications have transitioned from their initial "object" data type to a more refined "category" classification, distinctively categorized into either an unordered or ordered format, as delineated in the outputs presented. This transition entails a distinct enumeration of the explicit categories encompassing the spectrum of potential data values, as evidently demonstrated in the aforementioned outputs. This procedural addition serves the purpose of introducing a supplementary stage of data validation, facilitating the identification and rectification of any inadvertent errors or unexpected values that may have emerged within our dataset.

This supplementary procedure proves invaluable in the context of visualization. By encoding an ordinal variable with its inherent ordering, both Pandas and Matplotlib can accurately represent the data's natural progression. Devoid of this encoding, the visualization would default to an alphabetical arrangement, a scenario prone to perplexity. The preliminary pre-processing undertaken herein indeed confers distinct advantages, particularly in terms of enhancing the interpretability and coherence of visual representations.*

As part of the tidy data rules, we removed the URN column and null/ NaN values using the appropiate commands. As they serve no purpose.

From the quick and dirty 'data summary statistics'below, it can be seen we have 102 responses to work with. The 'top' column provides the mode (observations with highest frequency) for that particular survey question.

The summary of demographics and key variables are present in the table below:

title

Exploratory Data Analysis (EDA)¶

In [31]:
#Before we dive in to the EDA, first, let's implement an error handling procedure to handle exceptional 
#conditions, with a 'Try Except' statement.
#Try Except

def counts(col):
    try:
        counts_new=df_new[col].value_counts(sort=False)
        return counts_new
    except:
        print("Write appropriate column Name")
In [32]:
#Checking the Gender categories
counts("Gender")
Out[32]:
Male      56
Female    46
Name: Gender, dtype: int64
In [33]:
#I will deliberately use the wrong column which is not in the dataset. I want to verify that the "Try Except" code 
#works. And you can see that this message outputs "Write appropriate column Name".
counts("Education")
Write appropriate column Name
In [34]:
check = counts("Stressed") #This is real variable/ column name
check
Out[34]:
Sometimes        48
Often            24
Almost Always     6
Seldom           24
Name: Stressed, dtype: int64
In [35]:
#Lets run a Unit Test for the 'Stressed' variable
from pandas._testing import assert_frame_equal 
def test_counts(col):
    df_counts = pd.DataFrame(counts(col))
    df_ref_counts= pd.DataFrame(check)
    return assert_frame_equal(df_counts,df_ref_counts)
In [36]:
c = input("Column Name :")
test_counts(c) 

# After running block, use the Correct column name in the blank cell
#i.e. as we're verifying the 'Stressed' variable, type Stressed in the empty cell below and PRESS ENTER
Column Name :Stressed
In [37]:
#INTERRUPT KERNAL ONLY IF REQUIRED i.e. IF INCORRECT COLUMN NAME IS ENTERED

Frequency Distribution¶

In [38]:
#Impact on Financial Status categories
counts("Impact on Financial Status")
Out[38]:
Less Than Usual    48
Same as Usual      42
More Than Usual    12
Name: Impact on Financial Status, dtype: int64
In [39]:
#Impact on Communication categories
counts("Impact on Communication")
Out[39]:
More Than Usual    35
Less Than Usual    32
Same as Usual      35
Name: Impact on Communication, dtype: int64
In [40]:
#Unhappy or Depressed categories
df_new['Unhappy_or_Depressed'].value_counts()
Out[40]:
Sometimes        44
Often            27
Seldom           24
Almost Always     7
Name: Unhappy_or_Depressed, dtype: int64
In [41]:
#Study or Work Satisfaction categories
df_new['Study or Work Satisfaction'].value_counts()
Out[41]:
Neutral             39
Unsatisfied         35
Satisfied           22
Very Unsatisfied     6
Name: Study or Work Satisfaction, dtype: int64
In [42]:
#checking the Region categories
df_new['Region'].value_counts()
Out[42]:
Europe           26
Africa           19
North America    18
Asia             17
South America    13
Oceania           9
Name: Region, dtype: int64
In [43]:
#State of Mind Category
df_new['SOM_Affected'].value_counts()
Out[43]:
Yes    73
No     29
Name: SOM_Affected, dtype: int64

Dichotomy Question. Q4 - Has the covid-19 lockdown affected your state of mind in any way? ¶

In [44]:
#define a variable for 'yes' respondents
said_yes = df[df['SOM_Affected'] == 'Yes']
said_yes.shape
Out[44]:
(73, 18)
In [45]:
#The number of people who answered ‘Yes’ to the State of Mind question can be verfied by running a quick 
#value_counts() on this dataframe:

counts = df['SOM_Affected'].value_counts(sort=False)#Preserve order of ordinal variable
counts
Out[45]:
Yes    73
No     29
Name: SOM_Affected, dtype: int64
In [46]:
#Bar chart - State of Mind Affected with Covid 19 Lockdown
#count categories but do not sort by frequency

counts=df['SOM_Affected'].value_counts(sort=False)

sns.set(font_scale=1.0)
plt.figure(figsize=(10, 7),dpi=80)

plt.tick_params(labelsize=15);
plt.suptitle('State of Mind (SOM) Affected with Covid 19 Lockdown', fontsize=18)

ax=counts.plot.barh()
ax.set_xlabel('Frequency', fontsize=16)
ax.set_ylabel('Yes or No',fontsize=16)
plt.show()
In [47]:
#As we can see from the barchart above, and descriptive statistics below:
#Approximately 72% of the survey respondents stated the Covid 19 Lockdown affected their state of mind.

df['SOM_Affected'].describe()
Out[47]:
count     102
unique      2
top       Yes
freq       73
Name: SOM_Affected, dtype: object
In [48]:
#Normalised Stack Barchart - State of Mind(SOM) During the Covid 19 Lockdown

#This will adjust values measured on different scales to a common scale.
#Will also calculate percentages.

#*************************************************************#
#From those that stated yes, which represent males or females?#
#*************************************************************#

"""From the chart below, it's clear that Males suffered the most, with respect to State of Mind, 
during the Covid 19 lockdown."""

# Remove all 'prefer not to say' responses. None in our case
#df = df.loc[df['gender'] != 'prefer not to say']

# Generate normalised cross tabulation.
table = pd.crosstab(df['SOM_Affected'], df['Gender'], normalize='index')
# Sort by female responses and convert to percentage.
table = table.sort_values(by='Female', ascending=False) * 100

sns.set(font_scale=1.0)
plt.figure(figsize=(16,8),dpi=80)

ax = table.plot.barh(stacked=True)
ax.invert_yaxis()
ax.axvline(50, color='grey', linestyle='dashed', linewidth=1)

plt.tick_params(labelsize=15);
plt.suptitle('Percentage of State of Mind (SOM) Affected\nDuring the Covid 19 Lockdown', fontsize=18)


ax.set_xlabel('Percent', fontsize=16)
plt.ylabel('Did Covid 19 Lockdown\nAffect Our Repondents SOM?', multialignment='center', fontsize=16)

# Place legent outside plot axes.
ax.legend(bbox_to_anchor=(1.0, 1.0))

plt.show()

#*Reference 6 (refer to the foot of this notebook) was used to summarize the above.
<Figure size 1280x640 with 0 Axes>
In [49]:
#Cross Tabulation Table - Frequency of SOM Affected in Different Regions of the world

#The crosstab function builds a cross tabulation table that can portray the frequency with which different groups
#of data appear. It is normalized

table = pd.crosstab(df['Region'], df['SOM_Affected'], normalize='index')
# Sort by female responses and convert to percentage.
table = table.sort_values(by='Yes', ascending=False) * 100
table
Out[49]:
SOM_Affected No Yes
Region
Africa 21.052632 78.947368
North America 22.222222 77.777778
Asia 29.411765 70.588235
South America 30.769231 69.230769
Oceania 33.333333 66.666667
Europe 34.615385 65.384615
In [50]:
#Normalised Stack Barchart - Percentage of SOM Affected by REGION

#*************************************************************************************#
#Did some regions of the world suffer more than others, with respect to state of mind?#
#*************************************************************************************#

"""It's clear from the chart below and table above, that the African nation's state of mind was affected the most 
during the Covid 19 lockdown.
The European nation's state of mind was the least affected.
Deeper research is required on why this is and what factors came into play."""

#Remove all 'prefer not to say' responses. Not applicable here
#df = df.loc[df['gender'] != 'prefer not to say']

# Generate normalised cross tabulation.
table = pd.crosstab(df['Region'], df['SOM_Affected'], normalize='index')
# Sort by female responses and convert to percentage.
table = table.sort_values(by='Yes', ascending=False) * 100

sns.set(font_scale=1.0)
plt.figure(figsize=(16, 8),dpi=80)

ax = table.plot.barh(stacked=True)
ax.invert_yaxis()
ax.axvline(50, color='grey', linestyle='dashed', linewidth=1)

plt.tick_params(labelsize=15);
plt.suptitle('Percentage of State of Mind (SOM)\n Affected by Region', fontsize=18)

ax.set_xlabel('Percent', fontsize=16)
ax.set_ylabel('Regions of The World', fontsize=16)
# Place legent outside plot axes.
ax.legend(bbox_to_anchor=(1.0, 1.0))
plt.show()
<Figure size 1280x640 with 0 Axes>
In [51]:
table = pd.crosstab(df['Age'], df['SOM_Affected'], normalize='index')
# Sort by female responses and convert to percentage.
table = table.sort_values(by='Yes', ascending=False) * 100
table
Out[51]:
SOM_Affected No Yes
Age
Under 18 0.000000 100.000000
18-24 22.222222 77.777778
25-34 28.125000 71.875000
45-54 30.000000 70.000000
35-44 31.428571 68.571429
55-64 33.333333 66.666667
In [52]:
#Normalised Stack Barchart - Percentage of SOM Affected by AGE

#*****************************************************************************#
#Did certain age groups suffer more than others, with respect to state of mind?
#*****************************************************************************#

"""It's clear from the chart below and table above, there is a pattern. As the age bracket increases, our 
respondent's state of mind is less affected. Hence, younger people suffered the most and older people suffered the
least during the Covid 19 lockdown.
Perhaps there's more resiliance with the older generation, e.g experience/ handling of traumatic situations."""

#Remove all 'prefer not to say' responses. Not applicable here
#df = df.loc[df['gender'] != 'prefer not to say']

# Generate normalised cross tabulation.
table = pd.crosstab(df['Age'], df['SOM_Affected'], normalize='index')
# Sort by female responses and convert to percentage.
table = table.sort_values(by='Yes', ascending=False) * 100

sns.set(font_scale=1.0)
plt.figure(figsize=(18, 8), dpi=80)

ax = table.plot.barh(stacked=True)
ax.invert_yaxis()
ax.axvline(50, color='grey', linestyle='dashed', linewidth=1)

plt.tick_params(labelsize=15);
plt.suptitle('Percentage of State of Mind (SOM)\nAffected by Age', fontsize=18)

ax.set_xlabel('Percent', fontsize=16)
ax.set_ylabel('Age', fontsize=16)

# Place legent outside plot axes.
ax.legend(bbox_to_anchor=(1.0, 1.0))
plt.show()
<Figure size 1440x640 with 0 Axes>

Q9 - How many hours per day are/ were you able to spend outside, on average during the lockdown? ¶

In [53]:
#Count categories but do not sort by frequency
counts = df['Hours Spent Outside'].value_counts(sort=False)#Preserve order of ordinal variable
counts
Out[53]:
0-1 hours per day            22
2-3 hours per day            32
4-5 hours per day            38
6-8 hours per day             8
More than 8 hours per day     2
Name: Hours Spent Outside, dtype: int64
In [54]:
#count categories but do not sort by frequency

counts=df['Hours Spent Outside'].value_counts(sort=False)

sns.set(font_scale=1.0)
plt.figure(figsize=(10, 7),dpi=80)

plt.tick_params(labelsize=15);
plt.suptitle('How Much Fresh Air Our Respondents Had\nDuring the Covid 19 Lockdown', fontsize=18)

ax=counts.plot.barh()
ax.set_xlabel('Frequency', fontsize=16)
ax.set_ylabel('Hours Spent Outside', fontsize=16)
plt.show()
In [55]:
#As we can see from the barchart above and descriptive ststistics below:
#The mode or highest frequency is 4-5 hours per day.
#This was the most popular time band spent outside during the Covid 19 lockdown

df['Hours Spent Outside'].describe()
Out[55]:
count                   102
unique                    5
top       4-5 hours per day
freq                     38
Name: Hours Spent Outside, dtype: object
In [56]:
#Lets tabulate this data above in a coherant fashion
pd.crosstab(df['Hours Spent Outside'], df['Gender'], margins=True)
Out[56]:
Gender Female Male All
Hours Spent Outside
0-1 hours per day 6 16 22
2-3 hours per day 15 17 32
4-5 hours per day 18 20 38
6-8 hours per day 6 2 8
More than 8 hours per day 1 1 2
All 46 56 102
In [57]:
##df6 = df['Hours Spent Outside'].mode()[0]
##modee_pos = counts.index.get_loc(df6)
In [58]:
#Plot the counts as a bar chart
#ax=counts.plot.bar(rot=90)

#plt.tick_params(labelsize=15);
    
#ax.set_title('How Much Fresh Air During the Covid 19 Lockdown',fontsize=18)
#ax.set_xlabel('Frequency',fontsize=16)
#ax.set_ylabel('Hours Spent Outside',fontsize=16)

#Format counts as integers
#ax.yaxis.set_major_formatter(mpl.ticker.EngFormatter(places=0)
                             
#Find the mode so we can label it on the plot
#df6 = df['Hours Spent Outside'].mode()
#df6
                             
#Find the index of the mode in the plot
#modee_pos = counts.index.get_loc(df6)
#try:
    #ax.annotate('mode={}'.format(df6), xy=(modee_pos +0.25, 14),
            #xytext=(modee_pos + 4 + 0.7, 15),
            #arrowprops=dict(facecolor='black', shrink=0.05))
    #plt.show()
#except:
    #print("Plot is not identified")

Checkbox Question. Q5 - Do you share a home with any of the following? ¶

In [59]:
#********************************************************************#
#Who did our respondents live with during the Covid 19 Lockdown?
#********************************************************************#

"""From the value counts below, we can say that the majority (51) of "Male and Female" repondents are sharing 
   their home with other family members. Only 15 respondents lived alone during the Covid 19 Lockdown 
   (least amount of respondents)."""
Out[59]:
'From the value counts below, we can say that the majority (51) of "Male and Female" repondents are sharing \n   their home with other family members. Only 15 respondents lived alone during the Covid 19 Lockdown \n   (least amount of respondents).'
In [60]:
#1. Split strings into separate columns and create a new dataframe - Q5

home_sharing = df_new['Home Sharing'].str.split(',', expand=True)
pd.DataFrame(home_sharing)
Out[60]:
0 1 2 3
0 Your partner Other Family Memebers None None
1 No one I live alone None None
2 Your partner None None None
3 Your partner Your child/ children Other Family Memebers None
4 Other Family Memebers None None None
... ... ... ... ...
97 Other Family Memebers One or multiple housemates None None
98 No one I live alone None None
99 Other Family Memebers None None None
100 Your partner Your child/ children Other Family Memebers One or multiple housemates
101 Your partner Your child/ children Other Family Memebers None

102 rows × 4 columns

In [61]:
#2. Stack columns into a single variable

home_sharing = home_sharing.stack()             #This creates a multi-index
#home_sharing = 'home sharing'                   #Name the derived variable  #Set to categorical
home_sharing.index.names ='id','option'     #Name multi-index dimensions
df15 = pd.DataFrame(home_sharing)
df15
Out[61]:
0
id option
0 0 Your partner
1 Other Family Memebers
1 0 No one
1 I live alone
2 0 Your partner
... ... ...
100 2 Other Family Memebers
3 One or multiple housemates
101 0 Your partner
1 Your child/ children
2 Other Family Memebers

165 rows × 1 columns

In [62]:
#Joining checkbox answers with other variables i.e. Gender

df2=pd.read_csv('mental-health-survey.csv', usecols=[2,5])
df2.columns=['Gender', 'Home Sharing']
df2.index.name='id'
df2  
Out[62]:
Gender Home Sharing
id
0 Male Your partner,Other Family Memebers
1 Male No one, I live alone
2 Male Your partner
3 Male Your partner,Your child/ children,Other Family Memebers
4 Male Other Family Memebers
... ... ...
97 Male Other Family Memebers,One or multiple housemates
98 Female No one, I live alone
99 Male Other Family Memebers
100 Female Your partner,Your child/ children,Other Family Memebers,One or multiple housemates
101 Male Your partner,Your child/ children,Other Family Memebers

102 rows × 2 columns

In [63]:
#We want to check value counts for each choice in Home Sharing
#First create a new empty dataframe

df_checkbox = pd.DataFrame(columns = ['Share'])
print(df_checkbox)
Empty DataFrame
Columns: [Share]
Index: []
In [64]:
#Create a for loop to iterate the df15 dataframe from step 2
ar = []
for i in range(len(df15)):
    ar.append(df15.iloc[i,0])
    
    
for l in range(len(ar)):
    df_checkbox = df_checkbox._append({'Share' :ar[l]},
                       ignore_index = True)
In [65]:
#Now display counts for each choice in Home Sharing
df_checkbox['Share'].value_counts()
Out[65]:
Other Family Memebers         51
Your child/ children          31
Your partner                  28
One or multiple housemates    25
No one                        15
 I live alone                 15
Name: Share, dtype: int64

Likert Scale Questions/ Key Variables¶

In [66]:
unhappy_data = df_new['Unhappy_or_Depressed'].value_counts()
unhappy_df=pd.DataFrame(unhappy_data)
unhappy_df
Out[66]:
Unhappy_or_Depressed
Sometimes 44
Often 27
Seldom 24
Almost Always 7
In [67]:
Anxious_data = df_new['Anxious or Low Confidence'].value_counts()
Anxious_df=pd.DataFrame(Anxious_data)
Anxious_df
Out[67]:
Anxious or Low Confidence
Often 43
Sometimes 38
Seldom 10
Almost Always 6
Never 5
In [68]:
Stressed_data = df_new['Stressed'].value_counts()
Stressed_df = pd.DataFrame(Stressed_data)
Stressed_df
Out[68]:
Stressed
Sometimes 48
Often 24
Seldom 24
Almost Always 6
In [69]:
df38 = pd.DataFrame(columns = ['Question','Sometimes','Often','Seldom','Almost_Always','Never','Gender'])
print(df38)
Empty DataFrame
Columns: [Question, Sometimes, Often, Seldom, Almost_Always, Never, Gender]
Index: []
In [70]:
try:
    df38 = df38.append({'Question':"How often did you feel unhappy/ depressed",'Sometimes':unhappy_df.iloc[0,0], 'Often' :unhappy_df.iloc[1,0],'Seldom':unhappy_df.iloc[2,0], 'Almost_Always' :unhappy_df.iloc[3,0],'Never' :0,'Gender':'Gender'},
                   ignore_index = True)
    df38= df38.append({'Question':"How often did you feel under stress",'Sometimes':Stressed_df.iloc[0,0], 'Often' :Stressed_df.iloc[1,0],'Seldom':Stressed_df.iloc[2,0], 'Almost_Always' :Stressed_df.iloc[3,0],'Never':0,'Gender':'Gender'},
                   ignore_index = True)
    df38 =df38.append({'Question':"How often did you feel anxious/ a loss of confidence",'Sometimes':Anxious_df.iloc[0,0], 'Often' :Anxious_df.iloc[1,0],'Seldom':Anxious_df.iloc[2,0],'Almost_Always' :Anxious_df.iloc[3,0],'Never' :Anxious_df.iloc[4,0],'Gender':'Gender'},
                   ignore_index = True)
except:
    print("Please add appropriate column name and value")
In [71]:
#Print dataframe with all three sub questions and associated options
df38
Out[71]:
Question Sometimes Often Seldom Almost_Always Never Gender
0 How often did you feel unhappy/ depressed 44 27 24 7 0 Gender
1 How often did you feel under stress 48 24 24 6 0 Gender
2 How often did you feel anxious/ a loss of confidence 43 38 10 6 5 Gender
In [72]:
#************************************#
#How often did our respondents:#
#1. Feel unhappy or depressed?#
#2. Feel under stress#
#3. Feel anxious/ loss of confidence#
#************************************#

"""From the DataFrame above and Likert Chart below, we can infer that most of our respondents felt:
unhappy/depressed, under stress and a loss of confidence only 'Sometimes'. 'Often' was the next highest frequency
count for our respondents. Only a very small minority 'never' experienced any of these symptoms """


wrapper = textwrap.TextWrapper(width=50)
df38['Question'] = df38['Question'].apply(lambda x: wrapper.fill(x))


colors = ['white', 'firebrick','lightcoral','gainsboro','cornflowerblue', 'darkblue']
df38 = df38.set_index('Question')



middle = df38[["Sometimes", "Often"]].sum(axis=1) + df38["Seldom"]*.1
longest = middle.max()*1.1


df38.insert(0, '', (middle - longest).abs())
ax = df38[df38['Gender']=='Gender'].dropna(axis=1).plot.barh(
    stacked=True, 
    color=colors,
    figsize=(10,7),
    edgecolor='none'
    
)


z = plt.axvline(longest, linestyle='--', color='black', alpha=.5)
z.set_zorder(-1)


comp_long = int(df38.sum(axis=1).max()*1.05) 
plt.xlim(0, comp_long)



xvalues = [longest+50-50*i for i in range(4)]
xlabels = ['{:4.0f}'.format(x-longest) for x in xvalues]
plt.xticks(xvalues, xlabels)


box = ax.get_position()
ax.set_position([box.x0, box.y0 + box.height * 0.1,
                 box.width, box.height * 0.8])

# create the legend
handler, label = ax.get_legend_handles_labels()
ax.legend(
    handler, 
    label, 
    loc='upper center', 
    bbox_to_anchor=(.5, -0.2), 
    ncol=5, 
    edgecolor='white'
)

# set labels and title
plt.xlabel('Count',fontsize= 15)
plt.ylabel('',fontsize= 15)
plt.title('How often did you feel:',fontsize= 18)
plt.show()

Q7. During the Covid 19 lockdown, please state the effects and restraints on your wellbeing:¶

7.1 How often did you feel unhappy or depressed?¶

Age Based¶

In [73]:
#Finding the number of categories

sometimes = df_new['Unhappy_or_Depressed'] == 'Sometimes'
often = df_new['Unhappy_or_Depressed'] == 'Often'
seldom = df_new['Unhappy_or_Depressed'] == 'Seldom'
AlmostAlways = df_new['Unhappy_or_Depressed'] == 'Almost Always'
In [74]:
#number of people according to the Unhappy or Depressed categories
#agee = df_new['Unhappy or Depressed'].replace(unhappy)
#we're unable to change categorical to numerical data. hence, we assign a numerical scale to our likert data

age = {'Under 18': 0, '18-24': 1, '25-34':2, '35-44':3, '45-54':4, '55-64':5}
ageCol = df_new['Age'].replace(age)
ageCol.value_counts()
Out[74]:
3    35
2    32
4    20
1     9
0     3
5     3
Name: Age, dtype: int64
In [75]:
#Finding frequency
unCount1 = ageCol[sometimes].dropna()
unCount2 = ageCol[often].dropna()
unCount3 = ageCol[seldom].dropna()
unCount4 = ageCol[AlmostAlways].dropna()
In [76]:
#Finding Frequency
unFreq1 = unCount1.value_counts()
unFreq2 = unCount2.value_counts()
unFreq3 = unCount3.value_counts()
unFreq4 = unCount4.value_counts()
In [77]:
#*************************************************#
#How unhappy or depressed are different age groups?
#*************************************************#

"""This chart below and value counts above highlight that there's a relationship with unhappiness or depression,
in relation to age.  Apart from age group 2(25-34 year olds), there is a decline in unhappiness/ depression as 
our respondent's age increases.  
   It also highlights that our respondents that are unhappy, feel depressed 'sometimes', but not 'almost always'.
   The majority of the older three age groups (3,4,and 5) 'seldom' (not often) felt unhappy or depressed during
   the Covid 19 lockdown.
"""
plotdata = pd.DataFrame({'Sometimes': unFreq1, 'Often': unFreq2, 'Seldom': unFreq3, 'Almost Always': unFreq4})

plt.figure(figsize=(13, 9))

ax = plotdata.plot(kind='bar', stacked=True, figsize=(10,7),fontsize=18)
ax.set_xlabel('Age Groups:\n(0=Under 18, 1=18-24, 2=25-34, 3=35-44, 4=45-54, 5=55-64)',fontsize=16)
ax.set_ylabel('Number of Respondents Unhappy',fontsize=16)
plt.suptitle('Unhappiness or Depression Among Different Age Groups', fontsize=18)
Out[77]:
Text(0.5, 0.98, 'Unhappiness or Depression Among Different Age Groups')
<Figure size 1300x900 with 0 Axes>

Gender Based¶

In [78]:
#Encoding

unhappy = {'Sometimes': 0, 'Often': 1, 'Seldom': 2, 'Almost Always':3}
In [79]:
male = df_new['Gender'] == 'Male'
female = df_new['Gender'] == 'Female'
In [80]:
#Frequency of Unhappy or Depressed
#Value counts

un = df_new['Unhappy_or_Depressed'].replace(unhappy)
un.value_counts()
Out[80]:
0    44
1    27
2    24
3     7
Name: Unhappy_or_Depressed, dtype: int64
In [81]:
#for plotting
unCount1 = un[male].dropna()
unCount2 = un[female].dropna()
In [82]:
#for plotting
unFreq1 = unCount1.value_counts()
unFreq2 = unCount2.value_counts()
In [83]:
#****************************************************#
#How unhappy or depressed are different gender groups?
#****************************************************#

"""This chart below and value counts above highlight, that most of our respondents are feeling unhappy/ depressed 
only 'Sometimes'. More males appear to be 'Often' unhappy or depressed, compared to their female counterparts. 
The minority of our respondents are 'Almost always' unhappy or depressed, with slighly more females than males """

plotdata = pd.DataFrame({'Male': unFreq1, 'Female': unFreq2})
ax = plotdata.plot(kind='barh', stacked=True, fontsize=20, figsize=(10,7))
ax.set_xlabel('Number of Respondents Based on Gender',fontsize=16)
ax.set_ylabel('Unhappy or Depressed:\n(Sometimes: 0, Often: 1, Seldom: 2, Almost Always:3)',fontsize=16)
plt.suptitle('Unhappiness or Depression Level Among Different Genders', fontsize=18)
Out[83]:
Text(0.5, 0.98, 'Unhappiness or Depression Level Among Different Genders')

Q6. If you compare the pre-COVID-19 lockdown and the actual COVID-19 lockdown, how has:¶

6.1 Your amount of communication with family and friends changed?¶

Gender Based¶

In [84]:
#Encoding the categorical data
#We're unable to change categorical to numerical data. Hence, we assign a numerical scale to our likert data

ImpactCommunication = {'Same as Usual': 0, 'More Than Usual': 1, 'Less Than Usual': 2}
In [85]:
#value counts
#Replacing the actual values with encoded values

impact = df_new['Impact on Communication'].replace(ImpactCommunication)
impact.value_counts()
Out[85]:
1    35
0    35
2    32
Name: Impact on Communication, dtype: int64
In [86]:
#number of Gender category
male = df_new['Gender'] == 'Male'
female = df_new['Gender'] == 'Female'
In [87]:
#For plotting
impactScores1 = impact[male].dropna()
impactScores2 = impact[female].dropna()
In [88]:
#For plotting
impactFreq1 = impactScores1.value_counts()
impactFreq2 = impactScores2.value_counts()
In [89]:
#Frequency table
plotdata = pd.DataFrame({'Male': impactFreq1, 'Female': impactFreq2})
plotdata.head()
Out[89]:
Male Female
0 18 17
1 20 15
2 18 14
In [90]:
#****************************************************#
#What is the impact on communication - Based on gender?
#****************************************************#

"""
This chart below and value counts above highlight that our respondents are stating the impact on communication 
remains the same (35 respondents), which is true.  Our communication is still on going, thanks to the Internet.

However, not far behind, 32 respondents state they have less communication with family and friends.
There is a faily even balance between male and females, in all 3 types of responses.

"""

ax = plotdata.plot(kind='barh', stacked=True, fontsize=16, figsize=(10,7))
ax.set_xlabel('Number of Respondents Based on Gender',fontsize=16)
ax.set_ylabel("Impact on Communication:\n(Same as Usual: 0, More Than Usual: 1, Less Than Usual: 2",fontsize=16)
plt.suptitle('Impact on Communication Level on Different Genders', fontsize=18)
ax.legend(loc='center left', bbox_to_anchor=(1, 0.5))#insert legend box, outside chart
Out[90]:
<matplotlib.legend.Legend at 0x7ff611c7d670>

Age Based¶

In [91]:
#number of Age category
age18 = df_new['Age'] == 'Under 18'
age18_24 = df_new['Age'] == '18-24'
age25_34 = df_new['Age'] == '25-34'
age35_44 = df_new['Age'] == '35-44'
age45_54 = df_new['Age'] == '45-54'
age55_64 = df_new['Age'] == '55-64'
In [92]:
#For plotting
impactScores1 = impact[age18].dropna()
impactScores2 = impact[age18_24].dropna()
impactScores3 = impact[age25_34].dropna()
impactScores4 = impact[age35_44].dropna()
impactScores5 = impact[age45_54].dropna()
impactScores6 = impact[age55_64].dropna()
In [93]:
#For plotting
impactFreq1 = impactScores1.value_counts()
impactFreq2 = impactScores2.value_counts()
impactFreq3 = impactScores3.value_counts()
impactFreq4 = impactScores4.value_counts()
impactFreq5 = impactScores5.value_counts()
impactFreq6 = impactScores6.value_counts()
In [94]:
#*****************************************#
#Impact on Communication - Based on age group
#*****************************************#

"""This chart below highlights that our repondents that are aged between 25 and 34 felt the impact, of lack of 
communication the most. However, our repondents that are aged between 35 and 44 highlighted they experienced more 
communication during the Covid 19 lockdown.
"""
plotdata = pd.DataFrame({'Under 18': impactFreq1, '18-24': impactFreq2, '25-34':impactFreq3, '35-44':impactFreq4, '45-54':impactFreq5, '55-64':impactFreq6})
ax = plotdata.plot(kind='barh', stacked=True, fontsize=16, figsize=(10,7))
ax.set_xlabel('Number of Respondents Based on Age Groups',fontsize=16)
ax.set_ylabel("Impact on Communication:\n(Same as Usual: 0, More Than Usual: 1, Less Than Usual: 2",fontsize=16)
plt.suptitle('Impact on Communication Level Among Different Age Groups', fontsize=18)
ax.legend(loc='center left', bbox_to_anchor=(1, 0.5))#insert legend box, outside chart

#*Reference 7 (refer to the foot of this notebook) was used to summarize the above.
Out[94]:
<matplotlib.legend.Legend at 0x7ff611d7d8e0>

Q6. If you compare the pre-COVID-19 lockdown and the actual COVID-19 lockdown, how has:¶

6.2 Your financial status changed?¶

Gender Based¶

In [95]:
#Encoding
financial = {'Less Than Usual': 0, 'Same as Usual': 1, 'More Than Usual': 2}
In [96]:
#value counts
#Replacing the actual values with encoded values

fin = df_new['Impact on Financial Status'].replace(financial)
fin.value_counts()
Out[96]:
0    48
1    42
2    12
Name: Impact on Financial Status, dtype: int64
In [97]:
#For plotting
finCount1 = fin[male].dropna()
finCount2 = fin[female].dropna()
In [98]:
#For plotting
finFreq1 = finCount1.value_counts()
finFreq2 = finCount2.value_counts()
In [99]:
#******************************************#
#Impact on financial status - Based on gender
#******************************************#

"""This chart below and value counts above highlight that the impact on financial status is very significant. 
   More respondents (48 repondents) are stating that their financial status has declined (i.e. they now earn 'less 
   than usual'. This reflects the people who lost their jobs or were furloughed during the Covid 19 lockdown.
   The chart also highlights there are more males than females in this predicament.
"""
plotdata = pd.DataFrame({'Male': finFreq1, 'Female': finFreq2})
ax = plotdata.plot(kind='barh', stacked=True, fontsize=16, figsize=(10,7))
ax.set_xlabel('Number of Repondents Based on Gender',fontsize=16)
ax.set_ylabel('Impact on Financial Status:\n(Less Than Usual: 0, Same as Usual: 1, More Than Usual: 2)',fontsize=16)
plt.suptitle('Impact on Financial Status Level on Different Genders', fontsize=18)
Out[99]:
Text(0.5, 0.98, 'Impact on Financial Status Level on Different Genders')

Q8. During the Covid 19 lockdown, please state the effects and restraints on your daily life:¶

8.1. How satisfied were you with your study and/ or work?¶

Gender Based¶

In [100]:
#Encoding
study = {'Neutral': 0, 'Unsatisfied': 1, 'Satisfied': 2, 'Very Unsatisfied':3}
In [101]:
#Frequency of Study or Work Satisfaction 
stud = df_new['Study or Work Satisfaction'].replace(study)
stud.value_counts()
Out[101]:
0    39
1    35
2    22
3     6
Name: Study or Work Satisfaction, dtype: int64
In [102]:
#For plotting
studyCount1 = stud[male].dropna()
studyCount2 = stud[female].dropna()
In [103]:
#For plotting
studyFreq1 = studyCount1.value_counts()
studyFreq2 = studyCount2.value_counts()
In [104]:
#********************************************#
#Study or Work Satisfaction - Based on Gender#
#********************************************#

"""Most of our respondent's stance was neutral (39 counts), based on study or work satisfaction during the 
   Covid 19 lockdown. However, not far behind, 35 respondents stated they were unsatisfied with their study/ work 
   during this lockdown. The least amount of respondents (6 counts) stated they were 'Very unsatisfied'.
   There is a faily even balance between males and females, in all 4 types of responses.
   

"""

plotdata = pd.DataFrame({'Male': studyFreq1, 'Female': studyFreq2})
ax=plotdata.plot(kind='barh', stacked=True, fontsize=16, figsize=(10,7))
ax.set_xlabel('Number of Respondents Based on Gender', fontsize=16)
ax.set_ylabel('Study or Work Satisfaction:\n(Neutral: 0, Unsatisfied: 1, Satisfied: 2, Very Unsatisfied:3)',fontsize=16)
plt.suptitle('Study or Work Satisfaction Among Different Genders', fontsize=18)
Out[104]:
Text(0.5, 0.98, 'Study or Work Satisfaction Among Different Genders')

Age Based¶

In [105]:
#number of Age category
age18 = df_new['Age'] == 'Under 18'
age18_24 = df_new['Age'] == '18-24'
age25_34 = df_new['Age'] == '25-34'
age35_44 = df_new['Age'] == '35-44'
age45_54 = df_new['Age'] == '45-54'
age55_64 = df_new['Age'] == '55-64'
In [106]:
#For plotting
impactScores1 = impact[age18].dropna()
impactScores2 = impact[age18_24].dropna()
impactScores3 = impact[age25_34].dropna()
impactScores4 = impact[age35_44].dropna()
impactScores5 = impact[age45_54].dropna()
impactScores6 = impact[age55_64].dropna()
In [107]:
#For plotting
impactFreq1 = impactScores1.value_counts()
impactFreq2 = impactScores2.value_counts()
impactFreq3 = impactScores3.value_counts()
impactFreq4 = impactScores4.value_counts()
impactFreq5 = impactScores5.value_counts()
impactFreq6 = impactScores6.value_counts()
In [108]:
#For plotting
finScores1 = fin[age18].dropna()
finScores2 = fin[age18_24].dropna()
finScores3 = fin[age25_34].dropna()
finScores4 = fin[age35_44].dropna()
finScores5 = fin[age45_54].dropna()
finScores6 = fin[age55_64].dropna()
In [109]:
#For plotting
finFreq1 = finScores1.value_counts()
finFreq2 = finScores2.value_counts()
finFreq3 = finScores3.value_counts()
finFreq4 = finScores4.value_counts()
finFreq5 = finScores5.value_counts()
finFreq6 = finScores6.value_counts()
In [110]:
#************************************************#
#Impact on financial status - Based on age groups#
#************************************************#


"""This chart below highlights that most of our respondents who are between the age of 18 to 54 faced
   the brunt of the impact on their financial status i.e. earned 'Less than usual' during the Covid 19 lockdown
   There is also a small minority in the 25-54 age band who earned 'More than usual' during the Covid
   19 lockdown.  Deeper research is required to reveal futher insights.
   
"""
plotdata = pd.DataFrame({'Under 18': finFreq1, '18-24': finFreq2, '25-34': finFreq3, '35-44': finFreq4, '45-54': finFreq5, '55-64': finFreq6})
ax = plotdata.plot(kind='barh', stacked=True, fontsize=16, figsize=(10,7))
ax.set_xlabel('Number of Respondents Based on Age Groups',fontsize=16)
ax.set_ylabel('Impact on Financial Status:\n(Less Than Usual: 0, Same as Usual: 1, More Than Usual: 2)',fontsize=16)
plt.suptitle('Impact on Financial Status Among Different Age Groups', fontsize=18)
Out[110]:
Text(0.5, 0.98, 'Impact on Financial Status Among Different Age Groups')
In [111]:
#For plotting
unScores1 = un[age18].dropna()
unScores2 = un[age18_24].dropna()
unScores3 = un[age25_34].dropna()
unScores4 = un[age35_44].dropna()
unScores5 = un[age45_54].dropna()
unScores6 = un[age55_64].dropna()
In [112]:
#For plotting
unFreq1 = unScores1.value_counts()
unFreq2 = unScores2.value_counts()
unFreq3 = unScores3.value_counts()
unFreq4 = unScores4.value_counts()
unFreq5 = unScores5.value_counts()
unFreq6 = unScores6.value_counts()

Box Plots¶

Unhappy or Depressed and Age Variables¶

In [113]:
#Age Group frequency
df_new['Age'].value_counts()
Out[113]:
35-44       35
25-34       32
45-54       20
18-24        9
Under 18     3
55-64        3
Name: Age, dtype: int64
In [114]:
#Encoding
age = {'Under 18': 0, '18-24': 1, '25-34':2, '35-44':3, '45-54':4, '55-64':5}
In [115]:
#Replacing with encoded values
ageCol = df_new['Age'].replace(age)
In [116]:
ageCol
Out[116]:
0      3
1      3
2      3
3      3
4      0
      ..
97     0
98     2
99     2
100    3
101    2
Name: Age, Length: 102, dtype: int64
In [117]:
import seaborn as sns
In [118]:
#***************************************************#
#Boxplot based on Unhappy or Depressed and Age Group
#***************************************************#

"""
This boxtplot below highlights that our respondents who are unhappy or depressed, mostly range between the age of 
18 to 64 
"""

fig, ax = plt.subplots(figsize=(18, 14), dpi = 150)
sns.set_style('darkgrid')

sns.boxplot(x='Unhappy_or_Depressed',y=ageCol,data=df_new, ax=ax).set_title('Unhappy or Depressed Boxplot - Based on Age Groups',fontsize=20)


plt.ylabel('Age:\n(0=Under 18, 1=18-24, 2=25-34, 3=35-44, 4=45-54, 5=55-64)', fontsize = 20)
plt.xlabel('Unhappy or Depressed', fontsize = 20)
plt.xticks(rotation = 360, fontsize = 16)
plt.yticks(rotation = 0, fontsize = 16)

plt.show()
In [119]:
#Replacing with the encoded values
df_new['Age'] = df_new['Age'].replace(age)
In [120]:
#*********************************************************#
#Boxplot based on Study or Work Satisfaction and Age groups
#*********************************************************#

"""This boxplot highlights that most of our repondents were 'Very unsatisfied' with their study or work, during 
    the Covid 19 lockdown. That is, age groups 0 to 4 (under 18 to 54 years old) inclusive.
    
    This leaves age group 5 (ages 55 to 64).  This age group appears to be the only unique maximum out of all four 
    options. Infering, the older generation may not get phased with traumatic experiences. It can be identified in
    the 'Satisifed' area of the boxplot.  
    We have also identified some outliers for this 'Satisfied' area of the boxplot, they are age groups 0-1 
    (under 18 to 24 years old) inclusive. In other words, it was rare for this younger generation to be satisfied
    with work/ study during the Covid 19 lockdown. """
    
   

fig, ax = plt.subplots(figsize=(18, 14), dpi = 150)
sns.set_style('darkgrid')

sns.boxplot(x='Study or Work Satisfaction',y=ageCol,data=df_new, ax=ax).set_title('Study or Work Satisfaction Boxplot Based on Different Age Groups',fontsize=20)


plt.ylabel('Age:\n(0=Under 18, 1=18-24, 2=25-34, 3=35-44, 4=45-54, 5=55-64)', fontsize = 20)
plt.xlabel('Study or Work Satisfaction', fontsize = 20)
plt.xticks(rotation = 360, fontsize = 16)
plt.yticks(rotation = 0, fontsize = 16)

plt.show()
In [121]:
#*******************************************************
#Boxplot based on Impact on Communication and Age groups
#*******************************************************

"""This boxplot highlights that our respondents who are aged between, under 18 and 54, had a 'More than usual' 
   impact on their communication. In other words, they experienced more communication during the Covid 19 lockdown.  
   The median age group for this 'More than usual' selection was 35-44 years old (group 3).
   The minimum/ 1st Quartile are age groups 0 to 2 (under 18 to 34) inclusive.
   The maximum/ 3rd Quartile are age groups 3.5 to 4 (estimated age 40 to 54 ) inclusive"""


fig, ax = plt.subplots(figsize=(18, 14), dpi = 150)
sns.set_style('darkgrid')

sns.boxplot(x='Impact on Communication',y=ageCol,data=df_new, ax=ax).set_title('Impact on Communication Boxplot Based on Different Age Groups',fontsize=20)


plt.ylabel('Age:\n(0=Under 18, 1=18-24, 2=25-34, 3=35-44, 4=45-54, 5=55-64)', fontsize = 20)
plt.xlabel('Impact on Communication', fontsize = 20)
plt.xticks(rotation = 360, fontsize = 16)
plt.yticks(rotation = 0, fontsize = 16)

plt.show()
In [122]:
#**********************************************************
#Boxplot based on Impact on Financial Status and Age groups
#**********************************************************

"""This boxplot below highlights that there are a lot of respondents beween the age of 25 to 35 whose financial 
status declined.
"""
fig, ax = plt.subplots(figsize=(18, 14), dpi = 150)
sns.set_style('darkgrid')

sns.boxplot(x='Impact on Financial Status',y=ageCol,data=df_new, ax=ax).set_title('Impact on Financial Status Boxplot Based on Different Age Groups',fontsize=20)


plt.ylabel('Age:\n(0=Under 18, 1=18-24, 2=25-34, 3=35-44, 4=45-54, 5=55-64)', fontsize = 20)
plt.xlabel('Impact on Financial Status', fontsize = 20)
plt.xticks(rotation = 360, fontsize = 16)
plt.yticks(rotation = 0, fontsize = 16)

plt.show()

Univariate and Multivariate Analysis on Key Variables ¶

In [123]:
from sklearn import preprocessing
label_encoder = preprocessing.LabelEncoder()
In [124]:
#label encoding the column that matters i.e. relevant key variables
df_new['Impact on Financial Status']= label_encoder.fit_transform(df_new['Impact on Financial Status'])
df_new['Gender']= label_encoder.fit_transform(df_new['Gender'])
df_new['Impact on Communication']= label_encoder.fit_transform(df_new['Impact on Communication'])
df_new['Unhappy_or_Depressed']= label_encoder.fit_transform(df_new['Unhappy_or_Depressed'])
df_new['Study or Work Satisfaction']= label_encoder.fit_transform(df_new['Study or Work Satisfaction'])
In [125]:
"""These Univariate and Multivariate analysis plots show that if we apply any algorithms, any linearly separable 
algorithm will be suffice. In other words, we can distinguish the data points easily and they are not
overlapped with each other. Then any linear line based algorithm such as Linear Regression or Logistic Regression 
would be enough for this dataset 
"""
pair = sns.pairplot(df_new)
pair.fig.suptitle("Univariate and Multivariate analysis", y=1.08)
Out[125]:
Text(0.5, 1.08, 'Univariate and Multivariate analysis')

Checking Skewness and Kurtosis on Key Variables¶

In [126]:
#Skewness is a measure of the symmetry in a distribution. A symmetrical data set will have a skewness equal to 0. 
#Hence, a normal distribution will have a skewness of 0.
#The objective of choosing features/columns with skewed features greater than 0.75 is because we're searching to 
#work with features that have a more normal like distribution, so this is why we apply a log transformation, so to
#normalize the distribution i.e.skewness=0.

#*Reference 8 (refer to the foot of this notebook) was used to summarize the above.
In [127]:
#This 'Impact on Financial Status' column is not skewed. 
fig, ax = plt.subplots(figsize=(13, 10))
sns.distplot(df_new['Impact on Financial Status'])
ax.set_title('Impact on Financial Status - Skewness', fontsize=20)

plt.xticks(rotation = 360, fontsize = 16)
plt.yticks(rotation = 0, fontsize = 16)
plt.ylabel('Density', fontsize = 20)
plt.xlabel('Impact on Financial Status', fontsize = 20)

plt.show()
Intel MKL WARNING: Support of Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled only processors has been deprecated. Intel oneAPI Math Kernel Library 2025.0 will require Intel(R) Advanced Vector Extensions (Intel(R) AVX) instructions.
In [128]:
#This 'Impact on Communication'column is also not skewed
fig, ax = plt.subplots(figsize=(13, 10))
sns.distplot(df_new['Impact on Communication'])
ax.set_title('Impact on Communication - Skewness', fontsize=20)

plt.xticks(rotation = 360, fontsize = 16)
plt.yticks(rotation = 0, fontsize = 16)
plt.ylabel('Density', fontsize = 20)
plt.xlabel('Impact on Communication', fontsize = 20)

plt.show()
In [129]:
#This 'Unhappy or Depressed' column is a little bit negatively skewed

fig, ax = plt.subplots(figsize=(13, 10))
sns.distplot(df_new['Unhappy_or_Depressed'])
ax.set_title('Unhappy or Depressed - Skewness', fontsize=20)

plt.xticks(rotation = 360, fontsize = 16)
plt.yticks(rotation = 0, fontsize = 16)
plt.ylabel('Density', fontsize = 20)
plt.xlabel('Unhappy or Depressed', fontsize = 20)

plt.show()
In [130]:
#This 'Study or Work Satisfaction' column is a little bit positively skewed

fig, ax = plt.subplots(figsize=(13, 10))
sns.distplot(df_new['Study or Work Satisfaction'])
ax.set_title('Study or Work Satisfaction - Skewness', fontsize=20)

plt.xticks(rotation = 360, fontsize = 16)
plt.yticks(rotation = 0, fontsize = 16)
plt.ylabel('Density', fontsize = 20)
plt.xlabel('Study or Work Satisfaction', fontsize = 20)

plt.show()
In [131]:
#***************************************************************************************************************
#The skewness of the columns is less than 0.75. Hence, there is no requirement to handle the skewness. If there is 
#a skewness greater than 0.75, then we should handle that skewness by log transformation. However, in this case it
#is not necessary.
#Refer to skewness values for features/ columns below:
#***************************************************************************************************************

df_new.skew()
Out[131]:
Age                          -0.201356
Gender                       -0.199981
Impact on Communication      -0.054436
Impact on Financial Status    0.118847
Unhappy_or_Depressed         -0.497372
Study or Work Satisfaction    0.224462
dtype: float64
In [132]:
#***************************************************************************************************************
#It appears that the columns are mesokurtic in nature because of their negative values. This is acceptable 
#because the kurtosis of a univariate normal distribution (i.e. bell curve) is three. 
#If a distribution’s kurtosis is less than three, then the distribution produces 
#fewer and less extreme outliers than the normal distribution. A kurtosis greater 
#than three highlights the distribution produces more outliers than the normal distribution.
#***************************************************************************************************************

#Ref: https://en.wikipedia.org/wiki/Kurtosis

df_new.kurtosis()
Out[132]:
Age                          -0.073757
Gender                       -1.999612
Impact on Communication      -1.487382
Impact on Financial Status   -1.884858
Unhappy_or_Depressed         -1.030484
Study or Work Satisfaction   -1.284739
dtype: float64

Outliers¶

In [133]:
#****************************************************************************************************************
#Since there isn't any significant skewness in the columns/ features, then there aren't any 'significant' outliers 
#More skewed distribution tends to have more outliers. Whereas, it is quite the opposite for less skewed
#distributions.
#****************************************************************************************************************

1. Computing Pearson Correlation Scores between features¶

In [134]:
#**************************************************************************************************************
#There may be hidden and intricate relationships between the variables/ features in this dataset. Hence, we 
#require to compute and setup a correlation scores matrix, as can be seen below. Spearman's correlation was used 
#to summarize the linear relationship between any two data samples.

#It was important to focus on the key variables, from the likert scale questions.

#We opted for Spearman's Correlation over other correlations* i.e. Pearson Correlation for the following reasons: 
#Pearson correlation measures the linear relationship between two continuous variables and the Spearman 
#correlation measures the data sets based on their ranks for each variable. Spearman does not require assumption 
#of the relationship between variables, but it is always good to examine the relationship between variables. The 
#main difference between them is that Pearson is most appropriate for measurements from an interval scale and the 
#Spearman correlation is appropriate for measurements taken from ordinal scales - the majority of our data types.
#****************************************************************************************************************

#*Reference 9 (refer to the foot of this notebook) was used to summarize the above.
In [135]:
#Use key variables i.e. likert scale features
df_hm = df_new[['Age', 'Gender', 'Region', 'Impact on Communication', 'Impact on Financial Status', 'Unhappy_or_Depressed', 'Study or Work Satisfaction']].copy()
In [136]:
# 1) computing correlation scores
correlation_scores = df_hm.corr(method = 'spearman')

correlation_scores
Out[136]:
Age Gender Impact on Communication Impact on Financial Status Unhappy_or_Depressed Study or Work Satisfaction
Age 1.000000 -0.132517 -0.020084 0.016727 0.212969 0.086226
Gender -0.132517 1.000000 -0.040468 -0.163612 -0.017754 -0.005663
Impact on Communication -0.020084 -0.040468 1.000000 -0.112090 0.009470 -0.051955
Impact on Financial Status 0.016727 -0.163612 -0.112090 1.000000 0.105498 -0.112715
Unhappy_or_Depressed 0.212969 -0.017754 0.009470 0.105498 1.000000 -0.128693
Study or Work Satisfaction 0.086226 -0.005663 -0.051955 -0.112715 -0.128693 1.000000

Correlation Heatmap¶

In [137]:
#***************************************************************************************************************
#From the heatmap below or correlation scores above, we can determine some strong, positive  correlations.

#1. Age versus Unhappy_or_Depressed: There's a strong, positive correlation between these two variables. There is a 
#decline in unhappiness/ depression as our respondent's age increases i.e. the older our repondents are, the
#happier they were during the Covid 19 lockdown.

#2. Impact on Financial Status versus Unhappy_or_Depressed: There's a strong, positive correlation between these 
#two variables.  There is an increase in unhappiness as our repondent's financial status declines.

# correlation Heatmap
plt.figure(figsize = (10, 10), dpi = 200)
df_hm = sns.heatmap(correlation_scores, vmax = 0.9, linewidths = 0.5, cmap = "YlGnBu",
                annot = True, square = True)

bottom, top = df_hm.get_ylim()
df_hm.set_ylim(bottom + 0.5, top - 0.5)
plt.title('Correlation Heatmap')
plt.show()

Q12. Have you been tempted to access support from social groups during the covid-19 pandemic?¶

In [138]:
df['Supported_Activities']=df['Supported_Activities'].astype('category')
df['Healthcare_Access']=df['Healthcare_Access'].astype('category')
In [139]:
#12
convert_cat_to_num = {"Social_Support": {"I don't think I need social support": 0, "No - I became unable to use the social support groups I was using before the pandemic": 1,"No - I think I need social support but cannot access them to begin with":2,"Yes - I started using social support groups after the start of the pandemic, and able to access them":3,"Yes - I was using social support groups before, and continue to do so":4}}
df = df.replace(convert_cat_to_num )
df.head()
Out[139]:
Age Gender Region SOM_Affected Home Sharing Impact on Communication Impact on Financial Status Unhappy_or_Depressed Anxious or Low Confidence Stressed Study or Work Satisfaction Routine Satisfaction New Activities Satisfaction Hours Spent Outside Psychological_Reactions Supported_Activities Social_Support Healthcare_Access
0 35-44 Male Africa Yes Your partner,Other Family Memebers NaN NaN NaN NaN NaN NaN NaN NaN 2-3 hours per day Difficulty concentrating and making decisions,Difficulty sleeping or nightmares,Physical reactions, such as headaches, body pains, stomach problems and skin rashes,Increased use of alcohol, tobacc... Doing exercise,Watching/ Reading the news,Talking to friends/ family,Doing remote work or study 0 I didn't need access to healthcare during Covid 19
1 35-44 Male Europe Yes No one, I live alone NaN NaN NaN NaN NaN NaN NaN NaN 0-1 hours per day Changes in appetite, energy and activity levels,Difficulty concentrating and making decisions Doing exercise,Watching TV i.e. not the news,Doing remote work or study 0 I didn't need access to healthcare during Covid 19
2 35-44 Male Asia Yes Your partner NaN NaN NaN NaN NaN NaN NaN NaN 0-1 hours per day Changes in appetite, energy and activity levels,Difficulty concentrating and making decisions,Worsening of chronic health problems,Increased use of alcohol, tobacco or other drugs Doing exercise,Watching TV i.e. not the news,Doing housework,Doing remote work or study 1 I couldn't find over the counter medication in pharmacies or shops
3 35-44 Male Europe Yes Your partner,Your child/ children,Other Family Memebers NaN NaN NaN NaN NaN NaN NaN NaN 0-1 hours per day Changes in appetite, energy and activity levels,Difficulty sleeping or nightmares,Worsening of chronic health problems,Increased use of alcohol, tobacco or other drugs Doing exercise,Watching/ Reading the news,Doing remote work or study 2 I needed a GP appointment and couldn't get one at all
4 Under 18 Male Asia Yes Other Family Memebers NaN NaN NaN NaN NaN NaN NaN NaN 0-1 hours per day Changes in appetite, energy and activity levels,Difficulty concentrating and making decisions,Physical reactions, such as headaches, body pains, stomach problems and skin rashes,Worsening of chron... Doing exercise,Watching TV i.e. not the news,Doing housework,Playing music 2 I didn't need access to healthcare during Covid 19
In [140]:
df['Social_Support'].value_counts(sort=False)
Out[140]:
0    57
1    10
2    22
3    11
4     2
Name: Social_Support, dtype: int64
In [141]:
#Plot bar chart for Social Support Needs

"""It's quite evident from the barchart and value counts below, that most (57) of our repondents thought 
   they didn't require social support during the Covid 19 lockdown. The gender split is very even and the majority
   (18 respondents) are in the 35-44 age band.
   The minority of our respondents (2) were using social groups before the Covid 19 lockdown, and continue to do 
   so. The gender split is all Males (2 respondents) and those 2 males are evenly split across the 25-34 and 35-44
   age groups. 
   """

counts=df['Social_Support'].value_counts(sort=False)

sns.set(font_scale=1.0)
plt.figure(figsize=(10, 7),dpi=80)

plt.tick_params(labelsize=15);
plt.suptitle('Social Support Needed or Not', fontsize=18)

ax=counts.plot.barh()
ax.set_xlabel('Frequency',fontsize=16)
ax.set_ylabel("Social Support",fontsize=16)
plt.show()
In [142]:
pd.crosstab(df['Social_Support'], df['Gender'], margins=True)
Out[142]:
Gender Female Male All
Social_Support
0 28 29 57
1 4 6 10
2 10 12 22
3 4 7 11
4 0 2 2
All 46 56 102
In [143]:
pd.crosstab(df['Social_Support'], df['Age'], margins=True)
Out[143]:
Age Under 18 18-24 25-34 35-44 45-54 55-64 All
Social_Support
0 1 9 12 18 14 3 57
1 0 0 6 3 1 0 10
2 2 0 7 10 3 0 22
3 0 0 6 3 2 0 11
4 0 0 1 1 0 0 2
All 3 9 32 35 20 3 102

Q13 - Do you feel that the Covid 19 pandemic has affected your access to healthcare for other conditions?¶

In [144]:
convert_cat_to_num2 = {"Healthcare_Access": {"I didn't need access to healthcare during Covid 19": 0, "I couldn't find over the counter medication in pharmacies or shops": 1,"I couldn't get my prescription medication":2,"I needed a GP appointment and could only get a telephone or email one":3,"I needed a GP appointment and couldn't get one at all":4,"My regular hospital-based treatment was disrupted":5,"Planned procedures that I required, were cancelled":6}}
df = df.replace(convert_cat_to_num2)
df.head()
Out[144]:
Age Gender Region SOM_Affected Home Sharing Impact on Communication Impact on Financial Status Unhappy_or_Depressed Anxious or Low Confidence Stressed Study or Work Satisfaction Routine Satisfaction New Activities Satisfaction Hours Spent Outside Psychological_Reactions Supported_Activities Social_Support Healthcare_Access
0 35-44 Male Africa Yes Your partner,Other Family Memebers NaN NaN NaN NaN NaN NaN NaN NaN 2-3 hours per day Difficulty concentrating and making decisions,Difficulty sleeping or nightmares,Physical reactions, such as headaches, body pains, stomach problems and skin rashes,Increased use of alcohol, tobacc... Doing exercise,Watching/ Reading the news,Talking to friends/ family,Doing remote work or study 0 0
1 35-44 Male Europe Yes No one, I live alone NaN NaN NaN NaN NaN NaN NaN NaN 0-1 hours per day Changes in appetite, energy and activity levels,Difficulty concentrating and making decisions Doing exercise,Watching TV i.e. not the news,Doing remote work or study 0 0
2 35-44 Male Asia Yes Your partner NaN NaN NaN NaN NaN NaN NaN NaN 0-1 hours per day Changes in appetite, energy and activity levels,Difficulty concentrating and making decisions,Worsening of chronic health problems,Increased use of alcohol, tobacco or other drugs Doing exercise,Watching TV i.e. not the news,Doing housework,Doing remote work or study 1 1
3 35-44 Male Europe Yes Your partner,Your child/ children,Other Family Memebers NaN NaN NaN NaN NaN NaN NaN NaN 0-1 hours per day Changes in appetite, energy and activity levels,Difficulty sleeping or nightmares,Worsening of chronic health problems,Increased use of alcohol, tobacco or other drugs Doing exercise,Watching/ Reading the news,Doing remote work or study 2 4
4 Under 18 Male Asia Yes Other Family Memebers NaN NaN NaN NaN NaN NaN NaN NaN 0-1 hours per day Changes in appetite, energy and activity levels,Difficulty concentrating and making decisions,Physical reactions, such as headaches, body pains, stomach problems and skin rashes,Worsening of chron... Doing exercise,Watching TV i.e. not the news,Doing housework,Playing music 2 0
In [145]:
df['Healthcare_Access'].value_counts(sort=False)
Out[145]:
1     8
2     6
0    54
3     4
4     6
5    13
6    11
Name: Healthcare_Access, dtype: int64
In [146]:
#*******************************************************************
#Was healthcare access required or not during the Covid 19 lockdown?
#*******************************************************************

"""Again, it is very evident from the barchart and value counts below, that most repondents (54) didn't require 
   access to healthcare during Covid 19 lockdown.  The gender split is very even and the majority
   (18 respondents) are in the 35-44 age band.
   Although, just under half of our respondents still wish access to healthcare. This can be highlighted by 
   the demand, by viewing the responses in categories 1 to 6 inclusive. 
   The minority of our respondents (4) are in category 3('I needed a GP appointment and could only get a 
   telephone or email one'), where there is an even split of males and females.  The ages span across from
   25 to 54 years old.
   """

counts=df['Healthcare_Access'].value_counts(sort=False)

sns.set(font_scale=1.0)
plt.figure(figsize=(10, 7),dpi=80)

plt.tick_params(labelsize=15);
plt.suptitle('Healthcare_Access Needed or Not', fontsize=18)


ax=counts.plot.barh()
ax.set_xlabel('Frequency',fontsize=16)
ax.set_ylabel('Healthcare_Access',fontsize=16)
plt.show()
In [147]:
pd.crosstab(df['Healthcare_Access'], df['Gender'], margins=True)
Out[147]:
Gender Female Male All
Healthcare_Access
1 3 5 8
2 2 4 6
0 26 28 54
3 2 2 4
4 2 4 6
5 7 6 13
6 4 7 11
All 46 56 102
In [148]:
pd.crosstab(df['Healthcare_Access'], df['Age'], margins=True)
Out[148]:
Age Under 18 18-24 25-34 35-44 45-54 55-64 All
Healthcare_Access
1 0 1 2 4 1 0 8
2 0 1 3 1 1 0 6
0 3 6 13 18 12 2 54
3 0 0 1 2 1 0 4
4 0 0 2 3 1 0 6
5 0 0 6 3 4 0 13
6 0 1 5 4 0 1 11
All 3 9 32 35 20 3 102

Q11. During the Covid 19 lockdown, which of the following supported activities helped you feel better?¶

In [149]:
#Lets use clustering to answer this particular question
#Agglomerative clustering is a bottom-up clustering technique and will try to make clusters of data points that 
#are close to each other. And a dendrogram is used to visualize all clusters in a tree format in such a way that 
#all data points (called leaves) are visible and nicely arranged at the same depth. So what you get is data points
#clustered together as well as visibility of all points

#*******************************************************************************************************************
#Which of the supported activities mentioned in Q11 helped our respondents feel better, during the Covid 19 lockdown?
#*******************************************************************************************************************

"""We used a clustering algorithm to find similar data/ answer question 11. 
    In the bar chart below, we can infer that the counts of Cluster 3 is higher. The supported activities 
    associated with Cluster 3 consist of waching tv, exercise, doing remote work or study, housework and sometimes
    talk with family and friends - refer to dataframe below, for verification.
    Hence, we can infer that during Covid 19 lockdown most of our respondents participated in these aforementioned 
    supported activities, to help them feel better
    """

#Lets use clustering to 
proc_wpt = nltk.WordPunctTokenizer()
stop_words = nltk.corpus.stopwords.words('english')

def text_normalization(col):

    col = re.sub(r'[^a-zA-Z\s]', '', col, re.I|re.A)
    col = col.lower()
    col = col.strip()

    tokens_no = proc_wpt.tokenize(col)

    filter_tokens = [token for token in tokens_no if token not in stop_words]

    col = ' '.join(filter_tokens)
    return col

normalize = np.vectorize(text_normalization)

normalize_data = normalize(df['Supported_Activities'])
normalize_data

#*References 10 and 11 (refer to the foot of this notebook) was used to summarize the above.
Out[149]:
array(['exercisewatching reading newstalking friends familydoing remote work study',
       'exercisewatching tv ie newsdoing remote work study',
       'exercisewatching tv ie newsdoing houseworkdoing remote work study',
       'exercisewatching reading newsdoing remote work study',
       'exercisewatching tv ie newsdoing houseworkplaying music',
       'watching reading newstalking friends family', 'remote work study',
       'playing music',
       'watching tv ie newswatching reading newsi didnt need help',
       'exercisedoing housework',
       'watching reading newsdoing remote work study',
       'talking friends family', 'didnt need help', 'didnt need help',
       'exercisewatching tv ie newstalking friends family',
       'watching reading newsdoing remote work study',
       'talking friends family',
       'talking friends familydoing remote work study',
       'talking friends family', 'watching reading news',
       'watching reading news',
       'watching reading newsdoing houseworktalking friends family',
       'didnt need help', 'housework',
       'watching reading newstalking friends family', 'didnt need help',
       'talking friends familydoing remote work study',
       'exercisewatching reading news', 'playing music',
       'didnt need help', 'watching tv ie newswatching reading news',
       'houseworkdoing remote work study', 'watching reading news',
       'didnt need help', 'exerciseplaying music', 'didnt need help',
       'talking friends familyplaying music',
       'exercisedoing remote work study',
       'exercisewatching reading newstalking friends family',
       'watching reading newsdoing housework',
       'watching tv ie newstalking friends family', 'didnt need help',
       'didnt need help',
       'talking friends familydoing remote work studyplaying music',
       'exercisewatching reading newsdoing housework',
       'watching reading newstalking friends family',
       'watching tv ie news',
       'watching tv ie newsdoing houseworkdoing remote work study',
       'playing music', 'remote work studyplaying music', 'exercise',
       'watching reading newstalking friends family',
       'talking friends family', 'houseworkdoing remote work study',
       'didnt need help', 'talking friends familyplaying music',
       'exercisedoing houseworkplaying musici didnt need help',
       'watching tv ie newstalking friends familydoing remote work studyplaying music',
       'exercisewatching reading newstalking friends family',
       'exercisewatching reading newsdoing remote work studyplaying music',
       'watching reading newstalking friends family',
       'talking friends familydoing remote work studyplaying music',
       'houseworktalking friends familydoing remote work studyplaying music',
       'watching reading newsdoing remote work study',
       'exercisewatching reading newstalking friends familydoing remote work study',
       'watching tv ie newstalking friends familydoing remote work study',
       'watching reading newstalking friends familydoing remote work studyi didnt need help',
       'exercisewatching tv ie newstalking friends family',
       'talking friends familyplaying music',
       'watching tv ie newstalking friends familydoing remote work study',
       'didnt need help', 'didnt need help',
       'talking friends familydoing remote work study',
       'watching tv ie newswatching reading newstalking friends familyplaying music',
       'watching tv ie newswatching reading newstalking friends familydoing remote work study',
       'talking friends familydoing remote work study',
       'watching tv ie newstalking friends familyplaying music',
       'exercisedoing houseworkdoing remote work studyplaying music',
       'watching tv ie newswatching reading newsdoing houseworkdoing remote work study',
       'exercisedoing houseworkdoing remote work study',
       'watching tv ie newsdoing remote work studyplaying music',
       'exercisewatching tv ie newswatching reading newsdoing remote work study',
       'watching tv ie news', 'watching reading newsplaying music',
       'watching reading newsdoing remote work study', 'didnt need help',
       'remote work study',
       'exercisewatching tv ie newstalking friends familydoing remote work study',
       'talking friends family',
       'watching tv ie newswatching reading news',
       'watching tv ie newswatching reading news', 'didnt need help',
       'exercisedoing housework',
       'exercisewatching reading newsdoing remote work study',
       'exercisewatching reading newstalking friends familydoing remote work study',
       'houseworktalking friends familydoing remote work studyplaying music',
       'exercisedoing houseworktalking friends familyplaying music',
       'watching reading newsdoing houseworktalking friends familydoing remote work study',
       'exercisewatching reading newsdoing houseworkplaying music',
       'watching tv ie newsdoing houseworktalking friends familydoing remote work study',
       'watching tv ie newswatching reading newsdoing remote work studyplaying music',
       'talking friends familydoing remote work studyplaying music'],
      dtype='<U85')
In [150]:
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
In [151]:
count_vector = CountVectorizer(min_df=0., max_df=1.)
count_vector_matrix = count_vector.fit_transform(normalize_data)
count_vector_matrix = count_vector_matrix.toarray()
count_vector_matrix
Out[151]:
array([[0, 0, 0, ..., 0, 0, 1],
       [0, 0, 0, ..., 1, 0, 1],
       [0, 0, 0, ..., 1, 0, 1],
       ...,
       [0, 0, 0, ..., 1, 1, 1],
       [0, 0, 0, ..., 1, 1, 1],
       [0, 0, 0, ..., 0, 0, 1]])
In [152]:
feature_names = count_vector.get_feature_names_out()
pd.DataFrame(count_vector_matrix, columns=feature_names)
Out[152]:
didnt exercise exercisedoing exerciseplaying exercisewatching family familydoing familyplaying friends help ... playing reading remote study studyi studyplaying talking tv watching work
0 0 0 0 0 1 0 1 0 1 0 ... 0 1 1 1 0 0 0 0 0 1
1 0 0 0 0 1 0 0 0 0 0 ... 0 0 1 1 0 0 0 1 0 1
2 0 0 0 0 1 0 0 0 0 0 ... 0 0 1 1 0 0 0 1 0 1
3 0 0 0 0 1 0 0 0 0 0 ... 0 1 1 1 0 0 0 0 0 1
4 0 0 0 0 1 0 0 0 0 0 ... 0 0 0 0 0 0 0 1 0 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
97 0 0 0 0 0 0 1 0 1 0 ... 0 1 1 1 0 0 0 0 1 1
98 0 0 0 0 1 0 0 0 0 0 ... 0 1 0 0 0 0 0 0 0 0
99 0 0 0 0 0 0 1 0 1 0 ... 0 0 1 1 0 0 0 1 1 1
100 0 0 0 0 0 0 0 0 0 0 ... 0 1 1 0 0 1 0 1 1 1
101 0 0 0 0 0 0 1 0 1 0 ... 0 0 1 0 0 1 1 0 0 1

102 rows × 34 columns

In [153]:
vector = CountVectorizer(ngram_range=(2,2))
vector_matrix = vector.fit_transform(normalize_data)
vector_matrix = vector_matrix.toarray()
feature_names = vector.get_feature_names_out()
pd.DataFrame(vector_matrix, columns=feature_names)
Out[153]:
didnt need exercisedoing housework exercisedoing houseworkdoing exercisedoing houseworkplaying exercisedoing houseworktalking exercisedoing remote exerciseplaying music exercisewatching reading exercisewatching tv familydoing remote ... remote work studyi didnt studyplaying music talking friends tv ie watching reading watching tv work study work studyi work studyplaying
0 0 0 0 0 0 0 0 1 0 1 ... 1 0 0 0 0 0 0 1 0 0
1 0 0 0 0 0 0 0 0 1 0 ... 1 0 0 0 1 0 0 1 0 0
2 0 0 0 0 0 0 0 0 1 0 ... 1 0 0 0 1 0 0 1 0 0
3 0 0 0 0 0 0 0 1 0 0 ... 1 0 0 0 0 0 0 1 0 0
4 0 0 0 0 0 0 0 0 1 0 ... 0 0 0 0 1 0 0 0 0 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
97 0 0 0 0 0 0 0 0 0 1 ... 1 0 0 0 0 1 0 1 0 0
98 0 0 0 0 0 0 0 1 0 0 ... 0 0 0 0 0 0 0 0 0 0
99 0 0 0 0 0 0 0 0 0 1 ... 1 0 0 0 1 0 1 1 0 0
100 0 0 0 0 0 0 0 0 0 0 ... 1 0 1 0 1 0 1 0 0 1
101 0 0 0 0 0 0 0 0 0 1 ... 1 0 1 1 0 0 0 0 0 1

102 rows × 49 columns

In [154]:
from sklearn.feature_extraction.text import TfidfVectorizer
Tfid_v = TfidfVectorizer(min_df=0., max_df=1., use_idf=True)
Tfid_v_matrix = Tfid_v.fit_transform(normalize_data)
Tfid_v_matrix = Tfid_v_matrix.toarray()

feature_names = Tfid_v.get_feature_names_out()
pd.DataFrame(np.round(Tfid_v_matrix, 2), columns=feature_names)
Out[154]:
didnt exercise exercisedoing exerciseplaying exercisewatching family familydoing familyplaying friends help ... playing reading remote study studyi studyplaying talking tv watching work
0 0.0 0.0 0.0 0.0 0.43 0.0 0.41 0.0 0.30 0.0 ... 0.0 0.32 0.30 0.35 0.0 0.00 0.00 0.00 0.00 0.30
1 0.0 0.0 0.0 0.0 0.44 0.0 0.00 0.0 0.00 0.0 ... 0.0 0.00 0.31 0.36 0.0 0.00 0.00 0.39 0.00 0.31
2 0.0 0.0 0.0 0.0 0.38 0.0 0.00 0.0 0.00 0.0 ... 0.0 0.00 0.27 0.31 0.0 0.00 0.00 0.34 0.00 0.27
3 0.0 0.0 0.0 0.0 0.49 0.0 0.00 0.0 0.00 0.0 ... 0.0 0.37 0.34 0.40 0.0 0.00 0.00 0.00 0.00 0.34
4 0.0 0.0 0.0 0.0 0.38 0.0 0.00 0.0 0.00 0.0 ... 0.0 0.00 0.00 0.00 0.0 0.00 0.00 0.34 0.00 0.00
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
97 0.0 0.0 0.0 0.0 0.00 0.0 0.37 0.0 0.26 0.0 ... 0.0 0.28 0.26 0.31 0.0 0.00 0.00 0.00 0.29 0.26
98 0.0 0.0 0.0 0.0 0.42 0.0 0.00 0.0 0.00 0.0 ... 0.0 0.31 0.00 0.00 0.0 0.00 0.00 0.00 0.00 0.00
99 0.0 0.0 0.0 0.0 0.00 0.0 0.34 0.0 0.25 0.0 ... 0.0 0.00 0.25 0.29 0.0 0.00 0.00 0.31 0.27 0.25
100 0.0 0.0 0.0 0.0 0.00 0.0 0.00 0.0 0.00 0.0 ... 0.0 0.26 0.24 0.00 0.0 0.41 0.00 0.31 0.27 0.24
101 0.0 0.0 0.0 0.0 0.00 0.0 0.40 0.0 0.29 0.0 ... 0.0 0.00 0.29 0.00 0.0 0.49 0.45 0.00 0.00 0.29

102 rows × 34 columns

In [155]:
similar = cosine_similarity(Tfid_v_matrix)
df_of_similar = pd.DataFrame(similar)
df_of_similar
Out[155]:
0 1 2 3 4 5 6 7 8 9 ... 92 93 94 95 96 97 98 99 100 101
0 1.000000 0.498775 0.430419 0.673344 0.164119 0.434152 0.546106 0.000000 0.071785 0.0 ... 0.0 0.673344 1.000000 0.400388 0.079342 0.582514 0.276027 0.461073 0.225632 0.425897
1 0.498775 1.000000 0.862953 0.774599 0.588959 0.000000 0.567257 0.000000 0.211188 0.0 ... 0.0 0.774599 0.498775 0.169360 0.000000 0.425499 0.350102 0.642627 0.527706 0.180150
2 0.430419 0.862953 1.000000 0.668443 0.508244 0.000000 0.489516 0.000000 0.182246 0.0 ... 0.0 0.668443 0.430419 0.146149 0.000000 0.367186 0.302122 0.554557 0.455386 0.155461
3 0.673344 0.774599 0.668443 1.000000 0.360587 0.143804 0.633510 0.000000 0.083274 0.0 ... 0.0 1.000000 0.673344 0.189140 0.000000 0.578576 0.504758 0.444064 0.415581 0.201190
4 0.164119 0.588959 0.508244 0.360587 1.000000 0.000000 0.000000 0.170790 0.183394 0.0 ... 0.0 0.360587 0.164119 0.122269 0.118998 0.130403 0.833774 0.334619 0.436368 0.130058
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
97 0.582514 0.425499 0.367186 0.578576 0.130403 0.323364 0.485376 0.000000 0.131105 0.0 ... 0.0 0.578576 0.582514 0.636750 0.343894 1.000000 0.228563 0.860468 0.395050 0.378535
98 0.276027 0.350102 0.302122 0.504758 0.833774 0.121247 0.000000 0.185191 0.070211 0.0 ... 0.0 0.504758 0.276027 0.132579 0.129033 0.228563 1.000000 0.132135 0.326661 0.141025
99 0.461073 0.642627 0.554557 0.444064 0.334619 0.199219 0.453577 0.000000 0.231759 0.0 ... 0.0 0.444064 0.461073 0.595034 0.321364 0.860468 0.132135 1.000000 0.493575 0.353735
100 0.225632 0.527706 0.455386 0.415581 0.436368 0.207681 0.263471 0.154370 0.448947 0.0 ... 0.0 0.415581 0.225632 0.431309 0.107558 0.395050 0.326661 0.493575 1.000000 0.458788
101 0.425897 0.180150 0.155461 0.201190 0.130058 0.107210 0.317580 0.186073 0.000000 0.0 ... 0.0 0.201190 0.425897 0.753134 0.207619 0.378535 0.141025 0.353735 0.458788 1.000000

102 rows × 102 columns

In [156]:
#Plot dendrogram
#The number of clusters will be the number of vertical lines which are being intersected by the line drawn using 
#the threshold i.e. 3 clusters

#In a dendrogram, there are clades which indicate each branch, leaves which are the terminal end of each clade, 
#and outliers. Outliers are clades with only one leaf.

#Hence, there are no outliers in our Dendogram below



from scipy.cluster.hierarchy import dendrogram, linkage
Z = linkage(similar, 'ward')

plt.figure(figsize=(15,15))
plt.title('Hierarchical Clustering Dendrogram')
plt.xlabel('Supported Activities')
plt.ylabel('Distance')
dendrogram(Z)
plt.axhline(y=14, c='k', ls='--', lw=10)
Out[156]:
<matplotlib.lines.Line2D at 0x7ff5d01d8dc0>
In [157]:
#Dataframe with key variables; we're interested in 'Supported_Activities' associated with Cluster 3

from scipy.cluster.hierarchy import fcluster
distance = 8

clustering = fcluster(Z, distance, criterion='distance')
clustering = pd.DataFrame(clustering, columns=['Cluster'])
new_dataframe = pd.concat([df, clustering], axis=1)
new_dataframe
Out[157]:
Age Gender Region SOM_Affected Home Sharing Impact on Communication Impact on Financial Status Unhappy_or_Depressed Anxious or Low Confidence Stressed Study or Work Satisfaction Routine Satisfaction New Activities Satisfaction Hours Spent Outside Psychological_Reactions Supported_Activities Social_Support Healthcare_Access Cluster
0 35-44 Male Africa Yes Your partner,Other Family Memebers NaN NaN NaN NaN NaN NaN NaN NaN 2-3 hours per day Difficulty concentrating and making decisions,Difficulty sleeping or nightmares,Physical reactions, such as headaches, body pains, stomach problems and skin rashes,Increased use of alcohol, tobacc... Doing exercise,Watching/ Reading the news,Talking to friends/ family,Doing remote work or study 0 0 3
1 35-44 Male Europe Yes No one, I live alone NaN NaN NaN NaN NaN NaN NaN NaN 0-1 hours per day Changes in appetite, energy and activity levels,Difficulty concentrating and making decisions Doing exercise,Watching TV i.e. not the news,Doing remote work or study 0 0 3
2 35-44 Male Asia Yes Your partner NaN NaN NaN NaN NaN NaN NaN NaN 0-1 hours per day Changes in appetite, energy and activity levels,Difficulty concentrating and making decisions,Worsening of chronic health problems,Increased use of alcohol, tobacco or other drugs Doing exercise,Watching TV i.e. not the news,Doing housework,Doing remote work or study 1 1 3
3 35-44 Male Europe Yes Your partner,Your child/ children,Other Family Memebers NaN NaN NaN NaN NaN NaN NaN NaN 0-1 hours per day Changes in appetite, energy and activity levels,Difficulty sleeping or nightmares,Worsening of chronic health problems,Increased use of alcohol, tobacco or other drugs Doing exercise,Watching/ Reading the news,Doing remote work or study 2 4 3
4 Under 18 Male Asia Yes Other Family Memebers NaN NaN NaN NaN NaN NaN NaN NaN 0-1 hours per day Changes in appetite, energy and activity levels,Difficulty concentrating and making decisions,Physical reactions, such as headaches, body pains, stomach problems and skin rashes,Worsening of chron... Doing exercise,Watching TV i.e. not the news,Doing housework,Playing music 2 0 6
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
97 Under 18 Male Africa Yes Other Family Memebers,One or multiple housemates NaN NaN NaN NaN NaN NaN NaN NaN 0-1 hours per day Difficulty concentrating and making decisions Watching/ Reading the news,Doing housework,Talking to friends/ family,Doing remote work or study 2 0 3
98 25-34 Female Africa No No one, I live alone NaN NaN NaN NaN NaN NaN NaN NaN 0-1 hours per day Increased use of alcohol, tobacco or other drugs Doing exercise,Watching/ Reading the news,Doing housework,Playing music 1 0 6
99 25-34 Male Europe Yes Other Family Memebers NaN NaN NaN NaN NaN NaN NaN NaN 0-1 hours per day None of the above Watching TV i.e. not the news,Doing housework,Talking to friends/ family,Doing remote work or study 0 6 3
100 35-44 Female South America Yes Your partner,Your child/ children,Other Family Memebers,One or multiple housemates NaN NaN NaN NaN NaN NaN NaN NaN 0-1 hours per day Difficulty concentrating and making decisions,Difficulty sleeping or nightmares,None of the above Watching TV i.e. not the news,Watching/ Reading the news,Doing remote work or study,Playing music 2 6 3
101 25-34 Male Asia Yes Your partner,Your child/ children,Other Family Memebers NaN NaN NaN NaN NaN NaN NaN NaN 0-1 hours per day Difficulty concentrating and making decisions Talking to friends/ family,Doing remote work or study,Playing music 2 5 2

102 rows × 19 columns

In [158]:
data = new_dataframe.groupby('Cluster').groups
data
Out[158]:
{1: [8, 12, 13, 22, 25, 29, 33, 35, 41, 42, 54, 56, 66, 70, 71, 85, 91], 2: [17, 26, 43, 49, 61, 62, 72, 75, 77, 95, 101], 3: [0, 1, 2, 3, 6, 10, 15, 31, 37, 47, 53, 57, 59, 63, 64, 65, 69, 74, 78, 79, 80, 81, 84, 86, 87, 93, 94, 97, 99, 100], 4: [5, 11, 14, 16, 18, 21, 24, 38, 40, 45, 51, 52, 58, 60, 67, 73, 76, 88], 5: [19, 20, 27, 30, 32, 46, 82, 89, 90], 6: [4, 7, 9, 23, 28, 34, 36, 39, 44, 48, 50, 55, 68, 83, 92, 96, 98]}
In [159]:
#Cluster all the responses

ar1 = []
ar2 = []
ar3 = []
for x, y in data.items():
    for i in range(len(y)):
        
        ar1.append(x)
        ar2.append(y[i])
        result1 = new_dataframe['Supported_Activities'].loc[y[i]]
        ar3.append(result1)
        
list_tuples = list(zip(ar1, ar2, ar3))

df_with_cluster = pd.DataFrame(list_tuples,
                  columns = ['Cluster', 'Index','Supported_Activities']) 
     
# Print data
df_with_cluster.to_csv('survey009.csv')

print(df_with_cluster.to_string())
     Cluster  Index                                                                                            Supported_Activities
0          1      8                                     Watching TV i.e. not the news,Watching/ Reading the news,I didn't need help
1          1     12                                                                                              I didn't need help
2          1     13                                                                                              I didn't need help
3          1     22                                                                                              I didn't need help
4          1     25                                                                                              I didn't need help
5          1     29                                                                                              I didn't need help
6          1     33                                                                                              I didn't need help
7          1     35                                                                                              I didn't need help
8          1     41                                                                                              I didn't need help
9          1     42                                                                                              I didn't need help
10         1     54                                                                                              I didn't need help
11         1     56                                                 Doing exercise,Doing housework,Playing music,I didn't need help
12         1     66             Watching/ Reading the news,Talking to friends/ family,Doing remote work or study,I didn't need help
13         1     70                                                                                              I didn't need help
14         1     71                                                                                              I didn't need help
15         1     85                                                                                              I didn't need help
16         1     91                                                                                              I didn't need help
17         2     17                                                           Talking to friends/ family,Doing remote work or study
18         2     26                                                           Talking to friends/ family,Doing remote work or study
19         2     43                                             Talking to friends/ family,Doing remote work or study,Playing music
20         2     49                                                                        Doing remote work or study,Playing music
21         2     61                                             Talking to friends/ family,Doing remote work or study,Playing music
22         2     62                             Doing housework,Talking to friends/ family,Doing remote work or study,Playing music
23         2     72                                                           Talking to friends/ family,Doing remote work or study
24         2     75                                                           Talking to friends/ family,Doing remote work or study
25         2     77                                         Doing exercise,Doing housework,Doing remote work or study,Playing music
26         2     95                             Doing housework,Talking to friends/ family,Doing remote work or study,Playing music
27         2    101                                             Talking to friends/ family,Doing remote work or study,Playing music
28         3      0                 Doing exercise,Watching/ Reading the news,Talking to friends/ family,Doing remote work or study
29         3      1                                         Doing exercise,Watching TV i.e. not the news,Doing remote work or study
30         3      2                         Doing exercise,Watching TV i.e. not the news,Doing housework,Doing remote work or study
31         3      3                                            Doing exercise,Watching/ Reading the news,Doing remote work or study
32         3      6                                                                                      Doing remote work or study
33         3     10                                                           Watching/ Reading the news,Doing remote work or study
34         3     15                                                           Watching/ Reading the news,Doing remote work or study
35         3     31                                                                      Doing housework,Doing remote work or study
36         3     37                                                                       Doing exercise,Doing remote work or study
37         3     47                                        Watching TV i.e. not the news,Doing housework,Doing remote work or study
38         3     53                                                                      Doing housework,Doing remote work or study
39         3     57               Watching TV i.e. not the news,Talking to friends/ family,Doing remote work or study,Playing music
40         3     59                              Doing exercise,Watching/ Reading the news,Doing remote work or study,Playing music
41         3     63                                                           Watching/ Reading the news,Doing remote work or study
42         3     64                 Doing exercise,Watching/ Reading the news,Talking to friends/ family,Doing remote work or study
43         3     65                             Watching TV i.e. not the news,Talking to friends/ family,Doing remote work or study
44         3     69                             Watching TV i.e. not the news,Talking to friends/ family,Doing remote work or study
45         3     74  Watching TV i.e. not the news,Watching/ Reading the news,Talking to friends/ family,Doing remote work or study
46         3     78             Watching TV i.e. not the news,Watching/ Reading the news,Doing housework,Doing remote work or study
47         3     79                                                       Doing exercise,Doing housework,Doing remote work or study
48         3     80                                          Watching TV i.e. not the news,Doing remote work or study,Playing music
49         3     81              Doing exercise,Watching TV i.e. not the news,Watching/ Reading the news,Doing remote work or study
50         3     84                                                           Watching/ Reading the news,Doing remote work or study
51         3     86                                                                                      Doing remote work or study
52         3     87              Doing exercise,Watching TV i.e. not the news,Talking to friends/ family,Doing remote work or study
53         3     93                                            Doing exercise,Watching/ Reading the news,Doing remote work or study
54         3     94                 Doing exercise,Watching/ Reading the news,Talking to friends/ family,Doing remote work or study
55         3     97                Watching/ Reading the news,Doing housework,Talking to friends/ family,Doing remote work or study
56         3     99             Watching TV i.e. not the news,Doing housework,Talking to friends/ family,Doing remote work or study
57         3    100               Watching TV i.e. not the news,Watching/ Reading the news,Doing remote work or study,Playing music
58         4      5                                                           Watching/ Reading the news,Talking to friends/ family
59         4     11                                                                                      Talking to friends/ family
60         4     14                                         Doing exercise,Watching TV i.e. not the news,Talking to friends/ family
61         4     16                                                                                      Talking to friends/ family
62         4     18                                                                                      Talking to friends/ family
63         4     21                                           Watching/ Reading the news,Doing housework,Talking to friends/ family
64         4     24                                                           Watching/ Reading the news,Talking to friends/ family
65         4     38                                            Doing exercise,Watching/ Reading the news,Talking to friends/ family
66         4     40                                                        Watching TV i.e. not the news,Talking to friends/ family
67         4     45                                                           Watching/ Reading the news,Talking to friends/ family
68         4     51                                                           Watching/ Reading the news,Talking to friends/ family
69         4     52                                                                                      Talking to friends/ family
70         4     58                                            Doing exercise,Watching/ Reading the news,Talking to friends/ family
71         4     60                                                           Watching/ Reading the news,Talking to friends/ family
72         4     67                                         Doing exercise,Watching TV i.e. not the news,Talking to friends/ family
73         4     73               Watching TV i.e. not the news,Watching/ Reading the news,Talking to friends/ family,Playing music
74         4     76                                          Watching TV i.e. not the news,Talking to friends/ family,Playing music
75         4     88                                                                                      Talking to friends/ family
76         5     19                                                                                      Watching/ Reading the news
77         5     20                                                                                      Watching/ Reading the news
78         5     27                                                                       Doing exercise,Watching/ Reading the news
79         5     30                                                        Watching TV i.e. not the news,Watching/ Reading the news
80         5     32                                                                                      Watching/ Reading the news
81         5     46                                                                                   Watching TV i.e. not the news
82         5     82                                                                                   Watching TV i.e. not the news
83         5     89                                                        Watching TV i.e. not the news,Watching/ Reading the news
84         5     90                                                        Watching TV i.e. not the news,Watching/ Reading the news
85         6      4                                      Doing exercise,Watching TV i.e. not the news,Doing housework,Playing music
86         6      7                                                                                                   Playing music
87         6      9                                                                                  Doing exercise,Doing housework
88         6     23                                                                                                 Doing housework
89         6     28                                                                                                   Playing music
90         6     34                                                                                    Doing exercise,Playing music
91         6     36                                                                        Talking to friends/ family,Playing music
92         6     39                                                                      Watching/ Reading the news,Doing housework
93         6     44                                                       Doing exercise,Watching/ Reading the news,Doing housework
94         6     48                                                                                                   Playing music
95         6     50                                                                                                  Doing exercise
96         6     55                                                                        Talking to friends/ family,Playing music
97         6     68                                                                        Talking to friends/ family,Playing music
98         6     83                                                                        Watching/ Reading the news,Playing music
99         6     92                                                                                  Doing exercise,Doing housework
100        6     96                                         Doing exercise,Doing housework,Talking to friends/ family,Playing music
101        6     98                                         Doing exercise,Watching/ Reading the news,Doing housework,Playing music
In [160]:
#Plot final cluster
#There are 6 clusters, cluster 3 is the mode i.e. highest frequency count 

import seaborn as sns
import matplotlib.pyplot as plt
plt.figure(figsize=(20,20))  

sns.displot(df_with_cluster, x="Cluster")
Out[160]:
<seaborn.axisgrid.FacetGrid at 0x7ff613ae9280>
<Figure size 2000x2000 with 0 Axes>

Q10. Select if you've had any of these psychological reactions towards the Covid 19 pandemic?¶

In [161]:
#******************************************************************************************************************
#Which of the psychological reactions mentioned in Q10 occurred with our respondents, during the Covid 19 lockdown?
#******************************************************************************************************************

"""We will use a clustering algorithm again, to find similar data/ answer question.  In the bar chart below, we can 
    infer that the counts of cluster 3 is higher.  The psychological reactions associated with Cluster 3 consists
    of Changes in appetite, energy and activity levels, difficulty sleeping or nightmares and concentrating and 
    making decisions - refer to dataframe below, for verification. 
    Hence, we can infer that during Covid 19 lockdown, most of our respondents suffered from the aforementioned
    psychological reactions.  
    """

proc_wpt = nltk.WordPunctTokenizer()
stop_words = nltk.corpus.stopwords.words('english')

def text_normalization(col):

    col = re.sub(r'[^a-zA-Z\s]', '', col, re.I|re.A)
    col = col.lower()
    col = col.strip()

    tokens_no = proc_wpt.tokenize(col)

    filter_tokens = [token for token in tokens_no if token not in stop_words]

    col = ' '.join(filter_tokens)
    return col

normalize = np.vectorize(text_normalization)

normalize_data = normalize(df['Psychological_Reactions'])
normalize_data
Out[161]:
array(['difficulty concentrating making decisionsdifficulty sleeping nightmaresphysical reactions headaches body pains stomach problems skin rashesincreased use alcohol tobacco drugs',
       'changes appetite energy activity levelsdifficulty concentrating making decisions',
       'changes appetite energy activity levelsdifficulty concentrating making decisionsworsening chronic health problemsincreased use alcohol tobacco drugs',
       'changes appetite energy activity levelsdifficulty sleeping nightmaresworsening chronic health problemsincreased use alcohol tobacco drugs',
       'changes appetite energy activity levelsdifficulty concentrating making decisionsphysical reactions headaches body pains stomach problems skin rashesworsening chronic health problems',
       'changes appetite energy activity levelsdifficulty concentrating making decisions',
       'changes appetite energy activity levelsphysical reactions headaches body pains stomach problems skin rashesworsening chronic health problems',
       'difficulty concentrating making decisionsdifficulty sleeping nightmaresincreased use alcohol tobacco drugs',
       'difficulty sleeping nightmaresphysical reactions headaches body pains stomach problems skin rashes',
       'none', 'difficulty concentrating making decisions',
       'increased use alcohol tobacco drugs',
       'difficulty sleeping nightmaresphysical reactions headaches body pains stomach problems skin rashes',
       'none', 'changes appetite energy activity levels',
       'changes appetite energy activity levelsdifficulty concentrating making decisionsdifficulty sleeping nightmares',
       'physical reactions headaches body pains stomach problems skin rashesworsening chronic health problems',
       'difficulty concentrating making decisions',
       'changes appetite energy activity levels',
       'difficulty concentrating making decisions',
       'difficulty concentrating making decisions', 'none',
       'changes appetite energy activity levelsphysical reactions headaches body pains stomach problems skin rashes',
       'difficulty concentrating making decisions',
       'difficulty sleeping nightmares', 'none',
       'increased use alcohol tobacco drugs', 'none',
       'changes appetite energy activity levels',
       'difficulty concentrating making decisions',
       'difficulty concentrating making decisions',
       'changes appetite energy activity levels',
       'changes appetite energy activity levelsdifficulty sleeping nightmaresincreased use alcohol tobacco drugs',
       'none', 'changes appetite energy activity levels', 'none',
       'physical reactions headaches body pains stomach problems skin rashes',
       'none',
       'changes appetite energy activity levelsworsening chronic health problems',
       'physical reactions headaches body pains stomach problems skin rashes',
       'none', 'increased use alcohol tobacco drugs', 'none',
       'changes appetite energy activity levelsdifficulty sleeping nightmares',
       'changes appetite energy activity levelsdifficulty concentrating making decisions',
       'physical reactions headaches body pains stomach problems skin rashes',
       'difficulty concentrating making decisions',
       'changes appetite energy activity levelsdifficulty concentrating making decisionsdifficulty sleeping nightmaresphysical reactions headaches body pains stomach problems skin rashes',
       'difficulty concentrating making decisions', 'none',
       'difficulty sleeping nightmares',
       'difficulty concentrating making decisionsphysical reactions headaches body pains stomach problems skin rashes',
       'physical reactions headaches body pains stomach problems skin rashes',
       'changes appetite energy activity levelsdifficulty sleeping nightmares',
       'worsening chronic health problems',
       'worsening chronic health problems',
       'difficulty concentrating making decisionsworsening chronic health problems',
       'changes appetite energy activity levelsdifficulty concentrating making decisionsphysical reactions headaches body pains stomach problems skin rashes',
       'difficulty concentrating making decisionsdifficulty sleeping nightmaresworsening chronic health problems',
       'difficulty concentrating making decisionsphysical reactions headaches body pains stomach problems skin rashesincreased use alcohol tobacco drugs',
       'none', 'none',
       'changes appetite energy activity levelsdifficulty concentrating making decisions',
       'difficulty concentrating making decisionsphysical reactions headaches body pains stomach problems skin rashesworsening chronic health problems',
       'changes appetite energy activity levelsdifficulty concentrating making decisions',
       'none',
       'difficulty concentrating making decisionsdifficulty sleeping nightmares',
       'none', 'none', 'none', 'none', 'none',
       'difficulty sleeping nightmaresincreased use alcohol tobacco drugs',
       'difficulty concentrating making decisionsdifficulty sleeping nightmares',
       'difficulty concentrating making decisionsdifficulty sleeping nightmaresworsening chronic health problems',
       'difficulty concentrating making decisionsdifficulty sleeping nightmaresworsening chronic health problems',
       'difficulty sleeping nightmaresphysical reactions headaches body pains stomach problems skin rashesincreased use alcohol tobacco drugs',
       'changes appetite energy activity levelsdifficulty concentrating making decisions',
       'changes appetite energy activity levelsdifficulty concentrating making decisionsdifficulty sleeping nightmares',
       'difficulty sleeping nightmaresincreased use alcohol tobacco drugs',
       'physical reactions headaches body pains stomach problems skin rashesincreased use alcohol tobacco drugs',
       'difficulty concentrating making decisions',
       'changes appetite energy activity levels',
       'difficulty sleeping nightmaresphysical reactions headaches body pains stomach problems skin rashes',
       'changes appetite energy activity levelsdifficulty sleeping nightmares',
       'physical reactions headaches body pains stomach problems skin rashes',
       'changes appetite energy activity levels',
       'difficulty concentrating making decisions', 'none',
       'physical reactions headaches body pains stomach problems skin rashes',
       'difficulty sleeping nightmaresphysical reactions headaches body pains stomach problems skin rashes',
       'none', 'changes appetite energy activity levels',
       'difficulty concentrating making decisionsdifficulty sleeping nightmares',
       'difficulty concentrating making decisionsdifficulty sleeping nightmares',
       'none',
       'changes appetite energy activity levelsdifficulty concentrating making decisions',
       'difficulty concentrating making decisions',
       'increased use alcohol tobacco drugs', 'none',
       'difficulty concentrating making decisionsdifficulty sleeping nightmaresnone',
       'difficulty concentrating making decisions'], dtype='<U181')
In [162]:
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
In [163]:
count_vector = CountVectorizer(min_df=0., max_df=1.)
count_vector_matrix = count_vector.fit_transform(normalize_data)
count_vector_matrix = count_vector_matrix.toarray()
count_vector_matrix
Out[163]:
array([[0, 1, 0, ..., 1, 1, 0],
       [1, 0, 1, ..., 0, 0, 0],
       [1, 1, 1, ..., 1, 1, 0],
       ...,
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]])
In [164]:
feature_names = count_vector.get_feature_names_out()
pd.DataFrame(count_vector_matrix, columns=feature_names)
Out[164]:
activity alcohol appetite body changes chronic concentrating decisions decisionsdifficulty decisionsphysical ... rashes rashesincreased rashesworsening reactions skin sleeping stomach tobacco use worsening
0 0 1 0 1 0 0 1 0 1 0 ... 0 1 0 1 1 1 1 1 1 0
1 1 0 1 0 1 0 1 1 0 0 ... 0 0 0 0 0 0 0 0 0 0
2 1 1 1 0 1 1 1 0 0 0 ... 0 0 0 0 0 0 0 1 1 0
3 1 1 1 0 1 1 0 0 0 0 ... 0 0 0 0 0 1 0 1 1 0
4 1 0 1 1 1 1 1 0 0 1 ... 0 0 1 1 1 0 1 0 0 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
97 0 0 0 0 0 0 1 1 0 0 ... 0 0 0 0 0 0 0 0 0 0
98 0 1 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 1 1 0
99 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
100 0 0 0 0 0 0 1 0 1 0 ... 0 0 0 0 0 1 0 0 0 0
101 0 0 0 0 0 0 1 1 0 0 ... 0 0 0 0 0 0 0 0 0 0

102 rows × 42 columns

In [165]:
vector = CountVectorizer(ngram_range=(2,2))
vector_matrix = vector.fit_transform(normalize_data)
vector_matrix = vector_matrix.toarray()
feature_names = vector.get_feature_names_out()
pd.DataFrame(vector_matrix, columns=feature_names)
Out[165]:
activity levels activity levelsdifficulty activity levelsphysical activity levelsworsening alcohol tobacco appetite energy body pains changes appetite chronic health concentrating making ... skin rashesworsening sleeping nightmares sleeping nightmaresincreased sleeping nightmaresnone sleeping nightmaresphysical sleeping nightmaresworsening stomach problems tobacco drugs use alcohol worsening chronic
0 0 0 0 0 1 0 1 0 0 1 ... 0 0 0 0 1 0 1 1 1 0
1 0 1 0 0 0 1 0 1 0 1 ... 0 0 0 0 0 0 0 0 0 0
2 0 1 0 0 1 1 0 1 1 1 ... 0 0 0 0 0 0 0 1 1 0
3 0 1 0 0 1 1 0 1 1 0 ... 0 0 0 0 0 1 0 1 1 0
4 0 1 0 0 0 1 1 1 1 1 ... 1 0 0 0 0 0 1 0 0 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
97 0 0 0 0 0 0 0 0 0 1 ... 0 0 0 0 0 0 0 0 0 0
98 0 0 0 0 1 0 0 0 0 0 ... 0 0 0 0 0 0 0 1 1 0
99 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
100 0 0 0 0 0 0 0 0 0 1 ... 0 0 0 1 0 0 0 0 0 0
101 0 0 0 0 0 0 0 0 0 1 ... 0 0 0 0 0 0 0 0 0 0

102 rows × 50 columns

In [166]:
from sklearn.feature_extraction.text import TfidfVectorizer
Tfid_v = TfidfVectorizer(min_df=0., max_df=1., use_idf=True)
Tfid_v_matrix = Tfid_v.fit_transform(normalize_data)
Tfid_v_matrix = Tfid_v_matrix.toarray()

feature_names = Tfid_v.get_feature_names_out()
pd.DataFrame(np.round(Tfid_v_matrix, 2), columns=feature_names)
Out[166]:
activity alcohol appetite body changes chronic concentrating decisions decisionsdifficulty decisionsphysical ... rashes rashesincreased rashesworsening reactions skin sleeping stomach tobacco use worsening
0 0.00 0.26 0.00 0.22 0.00 0.00 0.17 0.00 0.26 0.00 ... 0.0 0.35 0.00 0.22 0.22 0.20 0.22 0.26 0.26 0.0
1 0.35 0.00 0.35 0.00 0.35 0.00 0.30 0.40 0.00 0.00 ... 0.0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.0
2 0.20 0.26 0.20 0.00 0.20 0.26 0.17 0.00 0.00 0.00 ... 0.0 0.00 0.00 0.00 0.00 0.00 0.00 0.26 0.26 0.0
3 0.20 0.27 0.20 0.00 0.20 0.27 0.00 0.00 0.00 0.00 ... 0.0 0.00 0.00 0.00 0.00 0.21 0.00 0.27 0.27 0.0
4 0.19 0.00 0.19 0.21 0.19 0.25 0.16 0.00 0.00 0.32 ... 0.0 0.00 0.34 0.21 0.21 0.00 0.21 0.00 0.00 0.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
97 0.00 0.00 0.00 0.00 0.00 0.00 0.45 0.61 0.00 0.00 ... 0.0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.0
98 0.00 0.41 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 ... 0.0 0.00 0.00 0.00 0.00 0.00 0.00 0.41 0.41 0.0
99 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 ... 0.0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.0
100 0.00 0.00 0.00 0.00 0.00 0.00 0.27 0.00 0.42 0.00 ... 0.0 0.00 0.00 0.00 0.00 0.33 0.00 0.00 0.00 0.0
101 0.00 0.00 0.00 0.00 0.00 0.00 0.45 0.61 0.00 0.00 ... 0.0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.0

102 rows × 42 columns

In [167]:
similar = cosine_similarity(Tfid_v_matrix)
df_of_similar = pd.DataFrame(similar)
df_of_similar
Out[167]:
0 1 2 3 4 5 6 7 8 9 ... 92 93 94 95 96 97 98 99 100 101
0 1.000000 0.100253 0.321276 0.313231 0.403088 0.100253 0.360272 0.599873 0.658482 0.0 ... 0.000000 0.376253 0.376253 0.0 0.100253 0.234855 0.420867 0.0 0.317268 0.234855
1 0.100253 1.000000 0.475736 0.383974 0.453282 1.000000 0.270199 0.131595 0.000000 0.0 ... 0.550987 0.192349 0.192349 0.0 1.000000 0.514250 0.000000 0.0 0.162195 0.514250
2 0.321276 0.475736 1.000000 0.801958 0.391156 0.475736 0.291541 0.421717 0.000000 0.0 ... 0.312990 0.109264 0.109264 0.0 0.475736 0.153032 0.426401 0.0 0.092135 0.153032
3 0.313231 0.383974 0.801958 1.000000 0.344995 0.383974 0.299184 0.411157 0.056271 0.0 ... 0.321196 0.080533 0.080533 0.0 0.383974 0.000000 0.437581 0.0 0.067908 0.000000
4 0.403088 0.453282 0.391156 0.344995 1.000000 0.453282 0.817433 0.071225 0.467645 0.0 ... 0.298218 0.104108 0.104108 0.0 0.453282 0.145809 0.000000 0.0 0.087787 0.145809
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
97 0.234855 0.514250 0.153032 0.000000 0.145809 0.514250 0.000000 0.308278 0.112356 0.0 ... 0.000000 0.450600 0.450600 0.0 0.514250 1.000000 0.000000 0.0 0.379960 1.000000
98 0.420867 0.000000 0.426401 0.437581 0.000000 0.000000 0.000000 0.552443 0.000000 0.0 ... 0.000000 0.000000 0.000000 0.0 0.000000 0.000000 1.000000 0.0 0.000000 0.000000
99 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 1.0 ... 0.000000 0.000000 0.000000 1.0 0.000000 0.000000 0.000000 1.0 0.000000 0.000000
100 0.317268 0.162195 0.092135 0.067908 0.087787 0.162195 0.000000 0.416456 0.155207 0.0 ... 0.000000 0.608722 0.608722 0.0 0.162195 0.379960 0.000000 0.0 1.000000 0.379960
101 0.234855 0.514250 0.153032 0.000000 0.145809 0.514250 0.000000 0.308278 0.112356 0.0 ... 0.000000 0.450600 0.450600 0.0 0.514250 1.000000 0.000000 0.0 0.379960 1.000000

102 rows × 102 columns

In [168]:
#Plot dendrogram
#The number of clusters will be the number of vertical lines which are being intersected by the line drawn using 
#the threshold i.e. 3 clusters

##In a dendrogram, there are clades which indicate each branch, leaves which are the terminal end of each clade, 
#and outliers. Outliers are clades with only one leaf.

#We have a clade with only one leaf, in our dendogram below - blue branch and single leaf on extreme left hand side

from scipy.cluster.hierarchy import dendrogram, linkage
Z = linkage(similar, 'ward')

plt.figure(figsize=(15,15))
plt.title('Hierarchical Clustering Dendrogram')
plt.xlabel('Supported Activities')
plt.ylabel('Distance')
dendrogram(Z)
plt.axhline(y=20, c='k', ls='--', lw=10)
Out[168]:
<matplotlib.lines.Line2D at 0x7ff603ebe4c0>
In [169]:
#Dataframe with key variables; we're interested in the 'Psychological_Reactions' associated with Cluster 3

from scipy.cluster.hierarchy import fcluster
distance = 8

clustering = fcluster(Z, distance, criterion='distance')
clustering = pd.DataFrame(clustering, columns=['Cluster'])
new_dataframe_02 = pd.concat([df, clustering], axis=1)
new_dataframe_02
Out[169]:
Age Gender Region SOM_Affected Home Sharing Impact on Communication Impact on Financial Status Unhappy_or_Depressed Anxious or Low Confidence Stressed Study or Work Satisfaction Routine Satisfaction New Activities Satisfaction Hours Spent Outside Psychological_Reactions Supported_Activities Social_Support Healthcare_Access Cluster
0 35-44 Male Africa Yes Your partner,Other Family Memebers NaN NaN NaN NaN NaN NaN NaN NaN 2-3 hours per day Difficulty concentrating and making decisions,Difficulty sleeping or nightmares,Physical reactions, such as headaches, body pains, stomach problems and skin rashes,Increased use of alcohol, tobacc... Doing exercise,Watching/ Reading the news,Talking to friends/ family,Doing remote work or study 0 0 2
1 35-44 Male Europe Yes No one, I live alone NaN NaN NaN NaN NaN NaN NaN NaN 0-1 hours per day Changes in appetite, energy and activity levels,Difficulty concentrating and making decisions Doing exercise,Watching TV i.e. not the news,Doing remote work or study 0 0 3
2 35-44 Male Asia Yes Your partner NaN NaN NaN NaN NaN NaN NaN NaN 0-1 hours per day Changes in appetite, energy and activity levels,Difficulty concentrating and making decisions,Worsening of chronic health problems,Increased use of alcohol, tobacco or other drugs Doing exercise,Watching TV i.e. not the news,Doing housework,Doing remote work or study 1 1 3
3 35-44 Male Europe Yes Your partner,Your child/ children,Other Family Memebers NaN NaN NaN NaN NaN NaN NaN NaN 0-1 hours per day Changes in appetite, energy and activity levels,Difficulty sleeping or nightmares,Worsening of chronic health problems,Increased use of alcohol, tobacco or other drugs Doing exercise,Watching/ Reading the news,Doing remote work or study 2 4 3
4 Under 18 Male Asia Yes Other Family Memebers NaN NaN NaN NaN NaN NaN NaN NaN 0-1 hours per day Changes in appetite, energy and activity levels,Difficulty concentrating and making decisions,Physical reactions, such as headaches, body pains, stomach problems and skin rashes,Worsening of chron... Doing exercise,Watching TV i.e. not the news,Doing housework,Playing music 2 0 2
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
97 Under 18 Male Africa Yes Other Family Memebers,One or multiple housemates NaN NaN NaN NaN NaN NaN NaN NaN 0-1 hours per day Difficulty concentrating and making decisions Watching/ Reading the news,Doing housework,Talking to friends/ family,Doing remote work or study 2 0 4
98 25-34 Female Africa No No one, I live alone NaN NaN NaN NaN NaN NaN NaN NaN 0-1 hours per day Increased use of alcohol, tobacco or other drugs Doing exercise,Watching/ Reading the news,Doing housework,Playing music 1 0 5
99 25-34 Male Europe Yes Other Family Memebers NaN NaN NaN NaN NaN NaN NaN NaN 0-1 hours per day None of the above Watching TV i.e. not the news,Doing housework,Talking to friends/ family,Doing remote work or study 0 6 1
100 35-44 Female South America Yes Your partner,Your child/ children,Other Family Memebers,One or multiple housemates NaN NaN NaN NaN NaN NaN NaN NaN 0-1 hours per day Difficulty concentrating and making decisions,Difficulty sleeping or nightmares,None of the above Watching TV i.e. not the news,Watching/ Reading the news,Doing remote work or study,Playing music 2 6 6
101 25-34 Male Asia Yes Your partner,Your child/ children,Other Family Memebers NaN NaN NaN NaN NaN NaN NaN NaN 0-1 hours per day Difficulty concentrating and making decisions Talking to friends/ family,Doing remote work or study,Playing music 2 5 4

102 rows × 19 columns

In [170]:
data = new_dataframe_02.groupby('Cluster').groups
data
Out[170]:
{1: [9, 13, 21, 25, 27, 33, 35, 37, 40, 42, 49, 60, 61, 65, 67, 68, 69, 70, 71, 88, 91, 95, 99], 2: [0, 4, 6, 8, 12, 16, 22, 36, 39, 45, 47, 51, 52, 57, 59, 63, 76, 80, 83, 85, 89, 90], 3: [1, 2, 3, 5, 14, 15, 18, 28, 31, 32, 34, 38, 43, 44, 53, 62, 64, 77, 78, 82, 84, 86, 92, 96], 4: [10, 17, 19, 20, 23, 29, 30, 46, 48, 81, 87, 97, 101], 5: [7, 11, 26, 41, 72, 79, 98], 6: [24, 50, 54, 55, 56, 58, 66, 73, 74, 75, 93, 94, 100]}
In [ ]:
 
In [171]:
ar1 = []
ar2 = []
ar3 = []

for x, y in data.items():
    for i in range(len(y)):
        
        ar1.append(x)
        ar2.append(y[i])
        result1 = new_dataframe_02['Psychological_Reactions'].loc[y[i]]
        ar3.append(result1)
        
list_tuples = list(zip(ar1, ar2, ar3)) 
    

list_tuples  
  

df_with_cluster_02= pd.DataFrame(list_tuples,
                  columns = ['Cluster', 'Index','Psychological_Reactions']) 
     
# Print data
#df_with_cluster_02.to_csv('survey0011.csv')

print(df_with_cluster_02.to_string())
     Cluster  Index                                                                                                                                                                                                 Psychological_Reactions
0          1      9                                                                                                                                                                                                       None of the above
1          1     13                                                                                                                                                                                                       None of the above
2          1     21                                                                                                                                                                                                       None of the above
3          1     25                                                                                                                                                                                                       None of the above
4          1     27                                                                                                                                                                                                       None of the above
5          1     33                                                                                                                                                                                                       None of the above
6          1     35                                                                                                                                                                                                       None of the above
7          1     37                                                                                                                                                                                                       None of the above
8          1     40                                                                                                                                                                                                       None of the above
9          1     42                                                                                                                                                                                                       None of the above
10         1     49                                                                                                                                                                                                       None of the above
11         1     60                                                                                                                                                                                                       None of the above
12         1     61                                                                                                                                                                                                       None of the above
13         1     65                                                                                                                                                                                                       None of the above
14         1     67                                                                                                                                                                                                       None of the above
15         1     68                                                                                                                                                                                                       None of the above
16         1     69                                                                                                                                                                                                       None of the above
17         1     70                                                                                                                                                                                                       None of the above
18         1     71                                                                                                                                                                                                       None of the above
19         1     88                                                                                                                                                                                                       None of the above
20         1     91                                                                                                                                                                                                       None of the above
21         1     95                                                                                                                                                                                                       None of the above
22         1     99                                                                                                                                                                                                       None of the above
23         2      0    Difficulty concentrating and making decisions,Difficulty sleeping or nightmares,Physical reactions, such as headaches, body pains, stomach problems and skin rashes,Increased use of alcohol, tobacco or other drugs
24         2      4  Changes in appetite, energy and activity levels,Difficulty concentrating and making decisions,Physical reactions, such as headaches, body pains, stomach problems and skin rashes,Worsening of chronic health problems
25         2      6                                                Changes in appetite, energy and activity levels,Physical reactions, such as headaches, body pains, stomach problems and skin rashes,Worsening of chronic health problems
26         2      8                                                                                                   Difficulty sleeping or nightmares,Physical reactions, such as headaches, body pains, stomach problems and skin rashes
27         2     12                                                                                                   Difficulty sleeping or nightmares,Physical reactions, such as headaches, body pains, stomach problems and skin rashes
28         2     16                                                                                                Physical reactions, such as headaches, body pains, stomach problems and skin rashes,Worsening of chronic health problems
29         2     22                                                                                     Changes in appetite, energy and activity levels,Physical reactions, such as headaches, body pains, stomach problems and skin rashes
30         2     36                                                                                                                                     Physical reactions, such as headaches, body pains, stomach problems and skin rashes
31         2     39                                                                                                                                     Physical reactions, such as headaches, body pains, stomach problems and skin rashes
32         2     45                                                                                                                                     Physical reactions, such as headaches, body pains, stomach problems and skin rashes
33         2     47     Changes in appetite, energy and activity levels,Difficulty concentrating and making decisions,Difficulty sleeping or nightmares,Physical reactions, such as headaches, body pains, stomach problems and skin rashes
34         2     51                                                                                       Difficulty concentrating and making decisions,Physical reactions, such as headaches, body pains, stomach problems and skin rashes
35         2     52                                                                                                                                     Physical reactions, such as headaches, body pains, stomach problems and skin rashes
36         2     57                                       Changes in appetite, energy and activity levels,Difficulty concentrating and making decisions,Physical reactions, such as headaches, body pains, stomach problems and skin rashes
37         2     59                                      Difficulty concentrating and making decisions,Physical reactions, such as headaches, body pains, stomach problems and skin rashes,Increased use of alcohol, tobacco or other drugs
38         2     63                                                  Difficulty concentrating and making decisions,Physical reactions, such as headaches, body pains, stomach problems and skin rashes,Worsening of chronic health problems
39         2     76                                                  Difficulty sleeping or nightmares,Physical reactions, such as headaches, body pains, stomach problems and skin rashes,Increased use of alcohol, tobacco or other drugs
40         2     80                                                                                    Physical reactions, such as headaches, body pains, stomach problems and skin rashes,Increased use of alcohol, tobacco or other drugs
41         2     83                                                                                                   Difficulty sleeping or nightmares,Physical reactions, such as headaches, body pains, stomach problems and skin rashes
42         2     85                                                                                                                                     Physical reactions, such as headaches, body pains, stomach problems and skin rashes
43         2     89                                                                                                                                     Physical reactions, such as headaches, body pains, stomach problems and skin rashes
44         2     90                                                                                                   Difficulty sleeping or nightmares,Physical reactions, such as headaches, body pains, stomach problems and skin rashes
45         3      1                                                                                                                           Changes in appetite, energy and activity levels,Difficulty concentrating and making decisions
46         3      2                                     Changes in appetite, energy and activity levels,Difficulty concentrating and making decisions,Worsening of chronic health problems,Increased use of alcohol, tobacco or other drugs
47         3      3                                                 Changes in appetite, energy and activity levels,Difficulty sleeping or nightmares,Worsening of chronic health problems,Increased use of alcohol, tobacco or other drugs
48         3      5                                                                                                                           Changes in appetite, energy and activity levels,Difficulty concentrating and making decisions
49         3     14                                                                                                                                                                         Changes in appetite, energy and activity levels
50         3     15                                                                                         Changes in appetite, energy and activity levels,Difficulty concentrating and making decisions,Difficulty sleeping or nightmares
51         3     18                                                                                                                                                                         Changes in appetite, energy and activity levels
52         3     28                                                                                                                                                                         Changes in appetite, energy and activity levels
53         3     31                                                                                                                                                                         Changes in appetite, energy and activity levels
54         3     32                                                                                      Changes in appetite, energy and activity levels,Difficulty sleeping or nightmares,Increased use of alcohol, tobacco or other drugs
55         3     34                                                                                                                                                                         Changes in appetite, energy and activity levels
56         3     38                                                                                                                                    Changes in appetite, energy and activity levels,Worsening of chronic health problems
57         3     43                                                                                                                                       Changes in appetite, energy and activity levels,Difficulty sleeping or nightmares
58         3     44                                                                                                                           Changes in appetite, energy and activity levels,Difficulty concentrating and making decisions
59         3     53                                                                                                                                       Changes in appetite, energy and activity levels,Difficulty sleeping or nightmares
60         3     62                                                                                                                           Changes in appetite, energy and activity levels,Difficulty concentrating and making decisions
61         3     64                                                                                                                           Changes in appetite, energy and activity levels,Difficulty concentrating and making decisions
62         3     77                                                                                                                           Changes in appetite, energy and activity levels,Difficulty concentrating and making decisions
63         3     78                                                                                         Changes in appetite, energy and activity levels,Difficulty concentrating and making decisions,Difficulty sleeping or nightmares
64         3     82                                                                                                                                                                         Changes in appetite, energy and activity levels
65         3     84                                                                                                                                       Changes in appetite, energy and activity levels,Difficulty sleeping or nightmares
66         3     86                                                                                                                                                                         Changes in appetite, energy and activity levels
67         3     92                                                                                                                                                                         Changes in appetite, energy and activity levels
68         3     96                                                                                                                           Changes in appetite, energy and activity levels,Difficulty concentrating and making decisions
69         4     10                                                                                                                                                                           Difficulty concentrating and making decisions
70         4     17                                                                                                                                                                           Difficulty concentrating and making decisions
71         4     19                                                                                                                                                                           Difficulty concentrating and making decisions
72         4     20                                                                                                                                                                           Difficulty concentrating and making decisions
73         4     23                                                                                                                                                                           Difficulty concentrating and making decisions
74         4     29                                                                                                                                                                           Difficulty concentrating and making decisions
75         4     30                                                                                                                                                                           Difficulty concentrating and making decisions
76         4     46                                                                                                                                                                           Difficulty concentrating and making decisions
77         4     48                                                                                                                                                                           Difficulty concentrating and making decisions
78         4     81                                                                                                                                                                           Difficulty concentrating and making decisions
79         4     87                                                                                                                                                                           Difficulty concentrating and making decisions
80         4     97                                                                                                                                                                           Difficulty concentrating and making decisions
81         4    101                                                                                                                                                                           Difficulty concentrating and making decisions
82         5      7                                                                                        Difficulty concentrating and making decisions,Difficulty sleeping or nightmares,Increased use of alcohol, tobacco or other drugs
83         5     11                                                                                                                                                                        Increased use of alcohol, tobacco or other drugs
84         5     26                                                                                                                                                                        Increased use of alcohol, tobacco or other drugs
85         5     41                                                                                                                                                                        Increased use of alcohol, tobacco or other drugs
86         5     72                                                                                                                                      Difficulty sleeping or nightmares,Increased use of alcohol, tobacco or other drugs
87         5     79                                                                                                                                      Difficulty sleeping or nightmares,Increased use of alcohol, tobacco or other drugs
88         5     98                                                                                                                                                                        Increased use of alcohol, tobacco or other drugs
89         6     24                                                                                                                                                                                       Difficulty sleeping or nightmares
90         6     50                                                                                                                                                                                       Difficulty sleeping or nightmares
91         6     54                                                                                                                                                                                    Worsening of chronic health problems
92         6     55                                                                                                                                                                                    Worsening of chronic health problems
93         6     56                                                                                                                                      Difficulty concentrating and making decisions,Worsening of chronic health problems
94         6     58                                                                                                    Difficulty concentrating and making decisions,Difficulty sleeping or nightmares,Worsening of chronic health problems
95         6     66                                                                                                                                         Difficulty concentrating and making decisions,Difficulty sleeping or nightmares
96         6     73                                                                                                                                         Difficulty concentrating and making decisions,Difficulty sleeping or nightmares
97         6     74                                                                                                    Difficulty concentrating and making decisions,Difficulty sleeping or nightmares,Worsening of chronic health problems
98         6     75                                                                                                    Difficulty concentrating and making decisions,Difficulty sleeping or nightmares,Worsening of chronic health problems
99         6     93                                                                                                                                         Difficulty concentrating and making decisions,Difficulty sleeping or nightmares
100        6     94                                                                                                                                         Difficulty concentrating and making decisions,Difficulty sleeping or nightmares
101        6    100                                                                                                                       Difficulty concentrating and making decisions,Difficulty sleeping or nightmares,None of the above
In [172]:
#Plot final cluster
#There are 6 clusters, cluster 3 is the mode i.e. highest frequency count 

import seaborn as sns
import matplotlib.pyplot as plt
plt.figure(figsize=(20,20))  

sns.displot(df_with_cluster_02, x="Cluster")
Out[172]:
<seaborn.axisgrid.FacetGrid at 0x7ff5c1ec1550>
<Figure size 2000x2000 with 0 Axes>

Conclusion and Discussion¶

Key Findings¶

From the 'State of Mind (SOM) Affected with Covid 19 Lockdown' barchart and value counts above, we derived that 73 of our repondent's (around 72% of our sample) SOM was affected by the Covid 19 Lockdown. It was also clear from the normalized stackedbar chart above, 'Percentage of State of Mind (SOM) Affected During the Covid 19 Lockdown' that males suffered more than females.

It was evident from the Normalized Stack Bar Chart, 'Percentage of State of Mind (SOM) Affected by Age' there was a trend. As the age bracket increased, our respondent's SOM was less affected. Hence, our younger respondents suffered the most and our older respondents suffered the least during the Covid 19 lockdown. Perhaps there's more resiliance with the older generation, e.g. experience/ handling of traumatic situations.

From the cross tabulation table and barchart, 'How Much Fresh Air Our Respondents Had During the Covid 19 Lockdown', it was refreshing to observe that just under half of our respondents (38 counts) managed to spend 4-5 hours outside during the lockdown. Thus, not all our repondents were trapped indoors with the strict rules and regulations in force.

Skewness¶

The ascertained skewness values of the columns, as evident from the above analysis, register below the threshold of 0.75. Consequently, the need to rectify skewness is obviated. It is pertinent to note that, should skewness exceed 0.75, redress through log transformation would be warranted.

Furthermore, the discernible negation of values in the columns or features is indicative of their mesokurtic nature, a characteristic that aligns with established statistical expectations. This phenomenon finds validation in the context of kurtosis, where a univariate normal distribution, characterized by a kurtosis of three, serves as a reference point. Given the kurtosis's deviation from three, a mesokurtic distribution demonstrates a propensity for generating fewer and less extreme outliers compared to the canonical normal distribution.

With the exception of XXXX

Outliers¶

Given the observed absence of substantial skewness within the columns or features, it follows that the presence of outliers can be deemed as relatively insignificant. Notably, a skewed distribution tends to engender a higher frequency of outliers.

With the exception of Q10, 'Select if you've had any of these psychological reactions towards the Covid 19 pandemic?'An outlier has been observed on the Dendogram, as we have a clade with only one leaf. This is also highlighted by it's associated barchart, where cluster 5 signifies 6 counts and attibutes to the skewness of the bar chart.

Correlation Scores and Heatmap¶

Spearman's correlation was used to summarize the linear relationship between any two data samples and provide some key insights (refer to above heatmap).

We opted for Spearman's Correlation over other correlations. The main difference between them is that Pearson is most appropriate for measurements from an interval scale and the Spearman correlation is appropriate for measurements taken from ordinal scales - the majority of our data types.

Research Questions¶

Research Q1:

How have the general population coped with the pandemic, as a whole? (Survey Questions 4,5,6,7 and 8)

The general population have coped with the pandemic reasonably well, according to the analysis above. Their communication has not been negatively impacted. The value counts and stacked barchart, 'Impact on Communication Level on Different Genders' above highlight that the majority of our respondents are stating the impact on communication 'remains the same' and 'more than usual' (providing a total of 70 respondents). There is an even balance between male and females, in all responses. Our communication was still on going, thanks to the Internet.

We observed our repondents are not feeling depressed or unhappy 'all' the time. The dataframe, df38 and likert chart, 'How Often Did You Feel?' inferred that the majority of our respondents felt 'unhappy/ depressed'(option 1), 'under stress'(option 2) and 'anxious/ a loss of confidence' (option 3), only 'Sometimes' (135 counts in total for all three options). 'Often' was the next highest frequency count (89 counts in total for all three options) for our respondents. Only a small minority (5 counts in total for all three options) 'never' experienced any of these symptoms.

However, that does not mean every situation is positive i.e. the 'Impact on Financial Status'. This has also been negatively impacted, as the financial status for the majority of respondents has declined. The value counts and stacked barchart,'Impact on Financial Status Level on Different Genders' above highlight that the impact on financial status is very significant.

The majority of respondents (48 repondents) are stating that their financial status has declined. This reflects the people who lost their jobs or were furloughed during the Covid 19 lockdown. The stacked barchart also highlights there are more males than females in this predicament.

The 'Study or Work Satisifaction Boxplot Based on Different Age Groups' highlights that most of our repondents were 'Very unsatisfied' with their study or work, during the Covid 19 lockdown. That is, age groups (under 18 to 54 years old) inclusive. This leaves age group (ages 55 to 64). This age group appears to be the only unique maximum out of all four options. Infering, the older generation may not get phased with traumatic experiences.

Research Q2:

What coping mechanisms have the general population adopted during the lockdown? (Survey Questions 9, 10 and 11)

Cluster analysis was utilised here, to identify the highest frequency of counts.

With respect to survey question 10, cluster 3 consists of changes in appetite, energy and activity levels, difficulty sleeping or nightmares and concentrating and making decisions. So, we can infer that during the Covid 19 lockdown, the majority of our respondents suffered these problems.

However, they had coping mechanisms to deal with these problems and the Covid 19 lockdown. With respect to survey question 11, cluster 3 consists of watching tv (but not news), doing exercise and housework. Some of them also like doing remote work or study.

Research Q:3

Is the healthcare system in general, adequate to deal with people with mental health issues? Especially during adverse advents such as a lock down? – Survey Questions 12 and 13.

It is very evident from the cross tabulation tables and 'Healthcare_Access Needed or Not' barchart above, that most repondents (54) didn't require access to healthcare during Covid 19 lockdown. The majority (18 respondents) are in the 35-44 age band. However, just under half of our respondents (total of 48 counts) still wish access to healthcare. This can be highlighted by viewing the responses in categories 1 to 6 inclusive.

With respect to survey question 10, It's quite evident from the cross tabulation tables and 'Social Support Needed or Not' barchart above, that most (57) of our repondents thought they didn't require social support during the Covid 19 lockdown. The majority (18 respondents) are in the 35-44 age band.

Objective

The prevailing consensus underscores a heightened demand for healthcare services vis-à-vis the domain of mental health services. Paradoxically, a substantial portion of our respondent cohort—approximately 72%—attests to the profound influence of the lockdown on their psychological well-being. Furthermore, the inquiry denoted as "survey question 10" effectively substantiates the prevalence of adverse repercussions and ancillary effects directly stemming from the imposed Covid-19 lockdown.

In light of these findings, we are poised to assert the viability of integrating online healthcare provisions, particularly telemedicine, or more broadly, intelligent healthcare solutions. Such an endeavor stands to yield multifaceted benefits, especially within a healthcare infrastructure grappling with an exacerbated backlog ensuing from the period of lockdown-induced operational constraints.

Future Research

Machine learning methodologies present a promising avenue to augment survey endeavors by furnishing additional insights via the facets of discovery, prediction, and optimization. As we progress, this trajectory affords an expanded capacity for amassing a more extensive dataset, thereby facilitating the deployment of diverse algorithms to effectuate heightened predictive acumen or classification accuracy.

An illustrative instance of our current analytical undertakings involves the utilization of hierarchical clustering—a robust technique to probe exploratory nuances when confronted with a limited pool of observations. Furthermore, a noteworthy prospective avenue for investigation lies in the realm of neural networks, assuming an adequate volume of observations. The versatility of neural networks manifests in their applicability to both unsupervised and supervised learning paradigms*.

Since the culmination of data acquisition, analysis, and visualization, notable advancements have transpired. Specifically, the development of a novel humanoid entity has taken shape, poised to address the arrears stemming from the Covid-19 pandemic. This sophisticated entity amalgamates capabilities spanning talk therapy, biometric data collection, and support for healthcare providers—an evolutionary stride emblematic of the intersection between technological innovation and healthcare exigencies [12 and 13].

(*Note: Neural networks' utility extends to both unsupervised and supervised learning scenarios.)

Evaluation¶

Following careful deliberation, we have resolved to streamline the selection of pivotal variables for prospective analyses. Specifically, we have opted to exclude certain variables, such as "daily life activities" and "new activities," corresponding to survey questions 8.2 and 8.3, respectively.

This deliberate omission stems from our assessment that these variables, while pertinent in their own right, do not substantially contribute to the analytical rigor sought. Instead, we have deemed the retained key variables—namely, survey questions 6.1, 6.2, 7.1, and 8.1—to collectively furnish an adequate foundation for conducting comprehensive statistical analyses. It is noteworthy that the judicious limitation of key variables is pivotal to circumventing potential complexities and preserving the analytical acuity essential to our investigatory pursuits.

References¶

  1. Waiting times and managing the backlog of appointments (including mental health services)

publications.parliament.uk/pa/cm5801/cmselect/cmhealth/320/32006.htm

  1. Cleaning, Analyzing, and Visualizing Survey Data in Python

towardsdatascience.com/cleaning-analyzing-and-visualizing-survey-data-in-python-42747a13c713

  1. How can I convert a 5- points Likert-scale for questionnaire in a survey to single numeric value?

researchgate.net/post/How-can-I-convert-a-5-points-Likert-scale-for-questionnaire-in-a-survey-to-single-numeric-value

  1. Mental well-being and association of the four factors coping structure model: A perspective of people living in lockdown during COVID-19

ncbi.nlm.nih.gov/pmc/articles/PMC7556853/

  1. Not Applicable

  2. How can I split the ylabel of a matplotlib figure into rows?

hstackoverflow.com/questions/43392988/how-can-i-split-the-ylabel-of-a-matplotlib-figure-into-rows

  1. How to put the legend out of the plot

stackoverflow.com/questions/4700614/how-to-put-the-legend-out-of-the-plot

  1. What is the significance of skewness?

kaggle.com/questions-and-answers/27334

  1. Survey on Data Analysis Using Correlation Methods

irjet.net/archives/V4/i4/IRJET-V4I4675.pdf

  1. Hierarchical Clustering with Python and Scikit-Learn

stackabuse.com/hierarchical-clustering-with-python-and-scikit-learn

  1. A Beginner’s Guide to Hierarchical Clustering and how to Perform it in Python

analyticsvidhya.com/blog/2019/05/beginners-guide-hierarchical-clustering/

  1. Neural Networks for Survey Researchers

surveypractice.org/article/2714-neural-networks-for-survey-researchers

  1. Meet Grace, the healthcare robot COVID-19 created

reuters.com/business/healthcare-pharmaceuticals/meet-grace-healthcare-robot-covid-19-created-2021-06-09/

Appendices¶

Word Count The following code will count the number of words in Markdown cells. Code cells are not included.

Main word count is the number of words in the main body of the text, before any appendices. Appendices word count is the number of words in any appendices.

In [ ]:
%%js

// Run this cell to update your word count.

function wordcount() {
    let wordCount = 0
    let appendixCount = 0
    let appendix = false

    let cells = Jupyter.notebook.get_cells()
    cells.forEach((cell) => {
        if (cell.cell_type == 'markdown') {
            let text = cell.get_text()
            // Stop counting when get to Appendices.
            if (text.startsWith('## Appendices')) {
                appendix = true
            }
            if (text.startsWith('## Word Count')) {
                text = ''
            }
            if (text) {
                let cellCount = text.toLowerCase().match(/\b[a-z\d]+\b/g).length
                if (!appendix) {
                    wordCount += cellCount
                } else {
                    appendixCount += cellCount
                }
            }
        }
    })
    return [wordCount, appendixCount]
}

let wc = wordcount()
element.append(`Main word count: ${wc[0]} (Appendices word count: ${wc[1]})`)