Divorce Prediction Using Machine Learning

Gaurav Sethi
7 min readOct 11, 2019

--

In this blog, I will go through the whole process of creating a machine learning model on the divorce predictors dataset. It provides information on the divorce prediction.

The Dataset

The Dataset is from UCIMachinelearning and it provides you all the relevant information needed for the prediction of Divorce. It contains 54 features and on the basis of these features we have to predict that the couple has been divorced or not. Value 1 represent Divorced and value 0 represent not divorced. Features are as follows:-

  1. When one of our apologies apologizes when our discussions go in a bad direction, the issue does not extend.
  2. I know we can ignore our differences, even if things get hard sometimes.
  3. When we need it, we can take our discussions with my wife from the beginning and correct it.
  4. When I argue with my wife, it will eventually work for me to contact him.
  5. The time I spent with my wife is special for us.
  6. We don’t have time at home as partners.
  7. We are like two strangers who share the same environment at home rather than family.
  8. I enjoy our holidays with my wife.
  9. I enjoy traveling with my wife.
  10. My wife and most of our goals are common.
  11. I think that one day in the future, when I look back, I see that my wife and I are in harmony with each other.
  12. My wife and I have similar values in terms of personal freedom.
  13. My husband and I have similar entertainment.
  14. Most of our goals for people (children, friends, etc.) are the same.
  15. Our dreams of living with my wife are similar and harmonious
  16. We’re compatible with my wife about what love should be
  17. We share the same views with my wife about being happy in your life
  18. My wife and I have similar ideas about how marriage should be
  19. My wife and I have similar ideas about how roles should be in marriage
  20. My wife and I have similar values in trust
  21. I know exactly what my wife likes.
  22. I know how my wife wants to be taken care of when she’s sick.
  23. I know my wife’s favorite food.
  24. I can tell you what kind of stress my wife is facing in her life.
  25. I have knowledge of my wife’s inner world.
  26. I know my wife’s basic concerns.
  27. I know what my wife’s current sources of stress are.
  28. I know my wife’s hopes and wishes.
  29. I know my wife very well.
  30. I know my wife’s friends and their social relationships.
  31. I feel aggressive when I argue with my wife.
  32. When discussing with my wife, I usually use expressions such as “you always“ or “you never”.
  33. I can use negative statements about my wife’s personality during our discussions.
  34. I can use offensive expressions during our discussions.
  35. I can insult our discussions.
  36. I can be humiliating when we argue.
  37. My argument with my wife is not calm.
  38. I hate my wife’s way of bringing it up.
  39. Fights often occur suddenly.
  40. We’re just starting a fight before I know what’s going on.
  41. When I talk to my wife about something, my calm suddenly breaks.
  42. When I argue with my wife, it only snaps in and I don’t say a word.
  43. I’m mostly thirsty to calm the environment a little bit.
  44. Sometimes I think it’s good for me to leave home for a while.
  45. I’d rather stay silent than argue with my wife.
  46. Even if I’m right in the argument, I’m thirsty not to upset the other side.
  47. When I argue with my wife, I remain silent because I am afraid of not being able to control my anger.
  48. I feel right in our discussions.
  49. I have nothing to do with what I’ve been accused of.
  50. I’m not actually the one who’s guilty about what I’m accused of.
  51. I’m not the one who’s wrong about problems at home.
  52. I wouldn’t hesitate to tell her about my wife’s inadequacy.
  53. When I discuss it, I remind her of my wife’s inadequate issues.
  54. I’m not afraid to tell her about my wife’s incompetence.

The dataset itself contains 170 rows and 55 columns.

Importing the Libraries

Data Preprocessing

We converted our dataset into pandas data frame. Then we created 2 different dataset from this dataset one is the data values and another is the label of the data values names as x and y. Using slicing method in python. Then we divided our dataset into training and testing dataset. In the ratio of 80–20,70–30,60–40.

Building Machine Learning Models

Random Forest:

Firstly we took random forest algorithm from scikit learn. Then we fit the data in three different ratio 80–20, 70–30, 60–40.

Confusion Matrix & Accuracy for this model

Accuracy In confusion matrix using random forest classifier on 3 different splits

Logistic Regression:

We took Logistic Regression algorithm from scikit learn. Then we fit the data in three different ratio 80–20, 70–30, 60–40.

Confusion Matrix & Accuracy for this model

Accuracy In confusion matrix using logistic regression classifier on 3 different splits

Neural Networks:

For neural network we used the keras library.

Then we fit the data in 80–20 and then we build our model

Naïve Bayes:

We took Naive Bayes algorithm from scikit learn. Then we fit the data in three different ratio 80–20, 70–30, 60–40.

Confusion Matrix & Accuracy for this Model

Accuracy In confusion matrix using Naive Bayes classifier on 3 different splits

Support Vector Machine:

We took SVM algorithm from scikit learn. Then we fit the data in three different ratio 80–20, 70–30, 60–40.

Confusion Matrix & Accuracy for this model

Accuracy In confusion matrix using SVM classifier on 3 different splits

CART Decision Tree:

We took Decision Tree algorithm from scikit learn. Then we fit the data in three different ratio 80–20, 70–30, 60–40.

Confusion Matrix & Accuracy for this model

Accuracy In confusion matrix using random Decision tree on 3 different splits

Perceptron:

We took Perceptron algorithm from scikit learn. Then we fit the data in three different ratio 80–20, 70–30, 60–40.

Confusion Matrix & Accuracy:

Evaluation Metrics

  1. Accuracy - Accuracy is the ratio of number of correct predictions to the total number of input samples. So accuracy for each model has been displayed for each.
  2. Confusion Matrix - A confusion matrix is a table that is often used to describe the performance of a classification model on a set of test data for which the true values are known. It allows the visualization of the performance of an algorithm.

Like i have taken the confusion matrix of Random Forest Classifier.

The first row is about the not-divorced-predictions: 19 couples were correctly classified as not divorced (called true negatives) and 0 were wrongly classified as not divorced (false positives).The second row is about the divorced-predictions: 2 couples were wrongly classified as divorced (false negatives) and 13 were correctly classified as divorced (true positives).

So we have successfully trained our dataset into different models for predicting whether a couple will get divorced or not. And also got the accuracy & confusion matrix for each model as well.

I’m providing link for the code so if you want to check then check it out:- https://drive.google.com/drive/folders/12cu1sXZvCQk57qu5TyMYZ7YgRTJ26GM8?usp=sharing

--

--

Responses (1)