search
HomeTechnology peripheralsAIA Guide to Understanding Interaction Terms

Introduction

Interaction terms are incorporated in regression modelling to capture the effect of two or more independent variables in the dependent variable. At times, it is not just the simple relationship between the control variables and the target variable that is under investigation, interaction terms can be quite helpful at these moments. These are also useful whenever the relationship between one independent variable and the dependent variable is conditional on the level of another independent variable.

This, of course, implies that the effect of one predictor on the response variable depends on the level of another predictor. In this blog, we examine the idea of interaction terms through a simulated scenario: predicting time and again the amount of time users would spend on an e-commerce channel using their past behavior.

Learning Objectives

  • Understand how interaction terms enhance the predictive power of regression models.
  • Learn to create and incorporate interaction terms in a regression analysis.
  • Analyze the impact of interaction terms on model accuracy through a practical example.
  • Visualize and interpret the effects of interaction terms on predicted outcomes.
  • Gain insights into when and why to apply interaction terms in real-world scenarios.

This article was published as a part of theData Science Blogathon.

Table of contents

  • Introduction
  • Understanding the Basics of Interaction Terms
  • How Interaction Terms Influence Regression Coefficients?
  • Simulated Scenario: User Behavior on an E-Commerce Platform
  • Model Without an Interaction Term
  • Model With an Interaction Term
  • Comparing Model Performance
  • Conclusion
  • Frequently Asked Questions

Understanding the Basics of Interaction Terms

In real life, we do not find that a variable works in isolation of the others and hence the real-life models are much more complex than those that we study in classes. For example, the effect of the end user navigation actions such as adding items to a cart on the time spent on an e-commerce platform differs when the user adds the item to a cart and buys them. Thus, adding interaction terms as variables to a regression model allows to acknowledge these intersections and, therefore, enhance the model’s fitness for purpose in terms of explaining the patterns underlying the observed data and/or predicting future values of the dependent variable.

Mathematical Representation

Let’s consider a linear regression model with two independent variables, X1​ and X2:

Y=β0​ β1​X1​ β2​X2​ ϵ,

where Y is the dependent variable, β0​ is the intercept, β1​ and β2​ are the coefficients for the independent variables X1​ and X2, respectively, and ϵis the error term.

Adding an Interaction Term

To include an interaction term between X1​ and X2​, we introduce a new variable X1⋅X2 ​:

Y = β0 β1X1 β2X2 β3(X1⋅X2) ϵ,

whereβ3represents the interaction effect between X1​ and X2​. The term X1⋅X2is the product of the two independent variables.

How Interaction Terms Influence Regression Coefficients?

  • β0​: The intercept, representing the expected value of Y when all independent variables are zero.
  • β1​: The effect of X1​ on Y when X2​ is zero.
  • β2​: The effect of X2​ on Y when X1​ is zero.
  • β3​: The change in the effect of X1​ on Y for a one-unit change in X2​, or equivalently, the change in the effect of X2​ on Y for a one-unit change in X1.​

Example: User Activity and Time Spent

First, let’s create a simulated dataset to represent user behavior on an online store. The data consists of:

  • added_in_cart: Indicates if a user has added products to their cart (1 for adding and 0 for not adding).
  • purchased: Whether or not the user completed a purchase (1 for completion or 0 for non-completion).
  • time_spent: The amount of time a user spent on an e-commerce platform. Our goal is to predict the duration of a user’s visit on an online store by analysing if they add products to their cart and complete a transaction.
# import libraries
import pandas as pd
import numpy as np

# Generate synthetic data
def generate_synthetic_data(n_samples=2000):

    np.random.seed(42)
    added_in_cart = np.random.randint(0, 2, n_samples)
    purchased = np.random.randint(0, 2, n_samples)
    time_spent = 3   2*purchased   2.5*added_in_cart   4*purchased*added_in_cart   np.random.normal(0, 1, n_samples)
    return pd.DataFrame({'purchased': purchased, 'added_in_cart': added_in_cart, 'time_spent': time_spent})

df = generate_synthetic_data()
df.head()

Output:

A Guide to Understanding Interaction Terms

Simulated Scenario: User Behavior on an E-Commerce Platform

As our next step we will first build an ordinary least square regression model with consideration to these actions of the market but without coverage to their interaction effects. Our hypotheses are as follows: (Hypothesis 1) There is an effect of the time spent on the website where each action is taken separately. Now we will then construct a second model that includes the interaction term that exists between adding products into cart and making a purchase.

This will help us counterpoise the impact of those actions, separately or combined on the time spent on the website. This suggests that we want to find out if users who both add products to the cart and make a purchase spend more time on the site than the time spent when each behavior is considered individually.

Model Without an Interaction Term

Following the model’s construction, the following outcomes were noted:

  • With a mean squared error (MSE) of 2.11, the model without the interaction term accounts for roughly 80% (test R-squared) and 82% (train R-squared) of the variance in the time_spent. This indicates that time_spent predictions are, on average, 2.11 squared units off from the actual time_spent. Although this model can be improved upon, it is reasonably accurate.
  • Furthermore, the plot below indicates graphically that although the model performs fairly well. There is still much room for improvement, especially in terms of capturing higher values of time_spent.
# Import libraries
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
import statsmodels.api as sm
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt

# Model without interaction term
X = df[['purchased', 'added_in_cart']]
y = df['time_spent']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Add a constant for the intercept
X_train_const = sm.add_constant(X_train)
X_test_const = sm.add_constant(X_test)

model = sm.OLS(y_train, X_train_const).fit()
y_pred = model.predict(X_test_const)

# Calculate metrics for model without interaction term
train_r2 = model.rsquared
test_r2 = r2_score(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)

print("Model without Interaction Term:")
print('Training R-squared Score (%):', round(train_r2 * 100, 4))
print('Test R-squared Score (%):', round(test_r2 * 100, 4))
print("MSE:", round(mse, 4))
print(model.summary())


# Function to plot actual vs predicted
def plot_actual_vs_predicted(y_test, y_pred, title):

    plt.figure(figsize=(8, 4))
    plt.scatter(y_test, y_pred, edgecolors=(0, 0, 0))
    plt.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], 'k--', lw=2)
    plt.xlabel('Actual')
    plt.ylabel('Predicted')
    plt.title(title)
    plt.show()

# Plot without interaction term
plot_actual_vs_predicted(y_test, y_pred, 'Actual vs Predicted Time Spent (Without Interaction Term)')

Output:

A Guide to Understanding Interaction Terms

A Guide to Understanding Interaction Terms

Model With an Interaction Term

  • A better fit for the model with the interaction term is indicated by the scatter plot with the interaction term, which displays predicted values substantially closer to the actual values.
  • The model explains much more of the variance in the time_spent with the interaction term, as shown by the higher test R-squared value (from 80.36% to 90.46%).
  • The model’s predictions with the interaction term are more accurate, as evidenced by the lower MSE (from 2.11 to 1.02).
  • The closer alignment of the points to the diagonal line, particularly for higher values of time_spent, indicates an improved fit. The interaction term aids in expressing how user actions collectively affect the amount of time spent.
# Add interaction term
df['purchased_added_in_cart'] = df['purchased'] * df['added_in_cart']
X = df[['purchased', 'added_in_cart', 'purchased_added_in_cart']]
y = df['time_spent']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Add a constant for the intercept
X_train_const = sm.add_constant(X_train)
X_test_const = sm.add_constant(X_test)

model_with_interaction = sm.OLS(y_train, X_train_const).fit()
y_pred_with_interaction = model_with_interaction.predict(X_test_const)

# Calculate metrics for model with interaction term
train_r2_with_interaction = model_with_interaction.rsquared
test_r2_with_interaction = r2_score(y_test, y_pred_with_interaction)
mse_with_interaction = mean_squared_error(y_test, y_pred_with_interaction)

print("\nModel with Interaction Term:")
print('Training R-squared Score (%):', round(train_r2_with_interaction * 100, 4))
print('Test R-squared Score (%):', round(test_r2_with_interaction * 100, 4))
print("MSE:", round(mse_with_interaction, 4))
print(model_with_interaction.summary())


# Plot with interaction term
plot_actual_vs_predicted(y_test, y_pred_with_interaction, 'Actual vs Predicted Time Spent (With Interaction Term)')

# Print comparison
print("\nComparison of Models:")
print("R-squared without Interaction Term:", round(r2_score(y_test, y_pred)*100,4))
print("R-squared with Interaction Term:", round(r2_score(y_test, y_pred_with_interaction)*100,4))
print("MSE without Interaction Term:", round(mean_squared_error(y_test, y_pred),4))
print("MSE with Interaction Term:", round(mean_squared_error(y_test, y_pred_with_interaction),4))

Output:

A Guide to Understanding Interaction Terms

A Guide to Understanding Interaction Terms

Comparing Model Performance

  • The model predictions without the interaction term are represented by the blue points. When the actual time spent values are higher, these points are more dispersed from the diagonal line.
  • The model predictions with the interaction term are represented by the red points. The model with the interaction term produces more accurate predictions. Especially for higher actual time spent values, as these points are closer to the diagonal line.
# Compare model with and without interaction term

def plot_actual_vs_predicted_combined(y_test, y_pred1, y_pred2, title1, title2):

    plt.figure(figsize=(10, 6))
    plt.scatter(y_test, y_pred1, edgecolors='blue', label=title1, alpha=0.6)
    plt.scatter(y_test, y_pred2, edgecolors='red', label=title2, alpha=0.6)
    plt.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], 'k--', lw=2)
    plt.xlabel('Actual')
    plt.ylabel('Predicted')
    plt.title('Actual vs Predicted User Time Spent')
    plt.legend()
    plt.show()

plot_actual_vs_predicted_combined(y_test, y_pred, y_pred_with_interaction, 'Model Without Interaction Term', 'Model With Interaction Term')

Output:

A Guide to Understanding Interaction Terms

Conclusion

The improvement in the model’s performance with the interaction term demonstrates that sometimes adding interaction terms to your model may enhance its importance. This example highlights how interaction terms can capture additional information that is not apparent from the main effects alone. In practice, considering interaction terms in regression models can potentially lead to more accurate and insightful predictions.

In this blog, we first generated a synthetic dataset to simulate user behavior on an e-commerce platform. We then constructed two regression models: one without interaction terms and one with interaction terms. By comparing their performance, we demonstrated the significant impact of interaction terms on the accuracy of the model.

Check out the full code and resources on GitHub.

Key Takeaways

  • Regression models with interaction terms can help to better understand the relationships between two or more variables and the target variable by capturing their combined effects.
  • Including interaction terms can significantly improve model performance, as evidenced by higher R-squared values and lower MSE in this guide.
  • Interaction terms are not just theoretical concepts, they can be applied to real-world scenarios.

Frequently Asked Questions

Q1. What are interaction terms in regression analysis?

A. They are variables created by multiplying two or more independent variables. They are used to capture the combined effect of these variables on the dependent variable. This can provide a more nuanced understanding of the relationships in the data.

Q2. When should I consider using interaction terms in my model?

A. You should consider using IT when you suspect that the effect of one independent variable on the dependent variable depends on the level of another independent variable. For example, if you believe that the impact of adding items to the cart on the time spent on an e-commerce platform depends on whether the user makes a purchase. You should include an interaction term between these variables.

Q3. How do I interpret the coefficients of interaction terms?

A. The coefficient of an interaction term represents the change in the effect of one independent variable on the dependent variable for a one-unit change in another independent variable. For example, in our example above we have an interaction term between purchased and added_in_cart, the coefficient tells us how the effect of adding items to the cart on time spent changes when a purchase is made.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

The above is the detailed content of A Guide to Understanding Interaction Terms. For more information, please follow other related articles on the PHP Chinese website!

Statement
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
An easy-to-understand explanation of how to create a VBA macro in ChatGPT!An easy-to-understand explanation of how to create a VBA macro in ChatGPT!May 14, 2025 am 02:40 AM

For beginners and those interested in business automation, writing VBA scripts, an extension to Microsoft Office, may find it difficult. However, ChatGPT makes it easy to streamline and automate business processes. This article explains in an easy-to-understand manner how to develop VBA scripts using ChatGPT. We will introduce in detail specific examples, from the basics of VBA to script implementation using ChatGPT integration, testing and debugging, and benefits and points to note. With the aim of improving programming skills and improving business efficiency,

I can't use the ChatGPT plugin function! Explaining what to do in case of an errorI can't use the ChatGPT plugin function! Explaining what to do in case of an errorMay 14, 2025 am 01:56 AM

ChatGPT plugin cannot be used? This guide will help you solve your problem! Have you ever encountered a situation where the ChatGPT plugin is unavailable or suddenly fails? The ChatGPT plugin is a powerful tool to enhance the user experience, but sometimes it can fail. This article will analyze in detail the reasons why the ChatGPT plug-in cannot work properly and provide corresponding solutions. From user setup checks to server troubleshooting, we cover a variety of troubleshooting solutions to help you efficiently use plug-ins to complete daily tasks. OpenAI Deep Research, the latest AI agent released by OpenAI. For details, please click ⬇️ [ChatGPT] OpenAI Deep Research Detailed explanation:

Does ChatGPT not follow the character count specification? A thorough explanation of how to deal with this!Does ChatGPT not follow the character count specification? A thorough explanation of how to deal with this!May 14, 2025 am 01:54 AM

When writing a sentence using ChatGPT, there are times when you want to specify the number of characters. However, it is difficult to accurately predict the length of sentences generated by AI, and it is not easy to match the specified number of characters. In this article, we will explain how to create a sentence with the number of characters in ChatGPT. We will introduce effective prompt writing, techniques for getting answers that suit your purpose, and teach you tips for dealing with character limits. In addition, we will explain why ChatGPT is not good at specifying the number of characters and how it works, as well as points to be careful about and countermeasures. This article

All About Slicing Operations in PythonAll About Slicing Operations in PythonMay 14, 2025 am 01:48 AM

For every Python programmer, whether in the domain of data science and machine learning or software development, Python slicing operations are one of the most efficient, versatile, and powerful operations. Python slicing syntax a

An easy-to-understand explanation of how to use ChatGPT to create quotes!An easy-to-understand explanation of how to use ChatGPT to create quotes!May 14, 2025 am 01:44 AM

The evolution of AI technology has accelerated business efficiency. What's particularly attracting attention is the creation of estimates using AI. OpenAI's AI assistant, ChatGPT, contributes to improving the estimate creation process and improving accuracy. This article explains how to create a quote using ChatGPT. We will introduce efficiency improvements through collaboration with Excel VBA, specific examples of application to system development projects, benefits of AI implementation, and future prospects. Learn how to improve operational efficiency and productivity with ChatGPT. Op

What is ChatGPT Pro (o1 Pro)? Explaining what you can do, the prices, and the differences between them from other plans!What is ChatGPT Pro (o1 Pro)? Explaining what you can do, the prices, and the differences between them from other plans!May 14, 2025 am 01:40 AM

OpenAI's latest subscription plan, ChatGPT Pro, provides advanced AI problem resolution! In December 2024, OpenAI announced its top-of-the-line plan, the ChatGPT Pro, which costs $200 a month. In this article, we will explain its features, particularly the performance of the "o1 pro mode" and new initiatives from OpenAI. This is a must-read for researchers, engineers, and professionals aiming to utilize advanced AI. ChatGPT Pro: Unleash advanced AI power ChatGPT Pro is the latest and most advanced product from OpenAI.

We explain how to create and correct your motivation for applying using ChatGPT! Also introduce the promptWe explain how to create and correct your motivation for applying using ChatGPT! Also introduce the promptMay 14, 2025 am 01:29 AM

It is well known that the importance of motivation for applying when looking for a job is well known, but I'm sure there are many job seekers who struggle to create it. In this article, we will introduce effective ways to create a motivation statement using the latest AI technology, ChatGPT. We will carefully explain the specific steps to complete your motivation, including the importance of self-analysis and corporate research, points to note when using AI, and how to match your experience and skills with company needs. Through this article, learn the skills to create compelling motivation and aim for successful job hunting! OpenAI's latest AI agent, "Open

What's so amazing about ChatGPT? A thorough explanation of its features and strengths!What's so amazing about ChatGPT? A thorough explanation of its features and strengths!May 14, 2025 am 01:26 AM

ChatGPT: Amazing Natural Language Processing AI and how to use it ChatGPT is an innovative natural language processing AI model developed by OpenAI. It is attracting attention around the world as an advanced tool that enables natural dialogue with humans and can be used in a variety of fields. Its excellent language comprehension, vast knowledge, learning ability and flexible operability have the potential to transform our lives and businesses. In this article, we will explain the main features of ChatGPT and specific examples of use, and explore the possibilities for the future that AI will unlock. Unraveling the possibilities and appeal of ChatGPT, and enjoying life and business

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

mPDF

mPDF

mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

WebStorm Mac version

WebStorm Mac version

Useful JavaScript development tools

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver Mac version

Dreamweaver Mac version

Visual web development tools