Live Project: IPL Data Analysis using Pandas AI

This project explores the power of Pandas AI, a generative AI tool that enables natural language querying on DataFrames using LLMs. The focus was to analyze the IPL 2023 Auction dataset and derive insights such as top buys, team-wise spending, unsold players, and player categories – all using simple English prompts instead of complex Python code. The project demonstrates how LLM-powered tools can simplify data exploration, visualization, and pattern recognition in sports analytics.

The main aim was to:

  • Perform data-driven insights using conversational prompts

  • Visualize financial trends across teams

  • Identify patterns in player selection & spending

  • Understand the limitations of AI-driven data analysis


Project Implementation:

  1. Prerequisites Setup:

    • Installed pandasai using !pip install -q pandasai

    • Ensured OpenAI API access was set up

  2. Library Imports:

    • Imported essential modules: pandas, SmartDataframe from pandasai, and OpenAI from pandasai.llm.openai

    • Initialized OpenAI LLM with API key

    • Created a SmartDataframe instance for intelligent querying

  3. Dataset Loading:

    • Used the IPL_Squad_2023_Auction_Dataset.csv dataset

    • Loaded the data into a DataFrame using pd.read_csv()

    • Explored data shape & top rows using .shape and .head()

  4. Interactive Data Analysis with Prompts:

    • Used .chat() method to interact with data using natural language

    • Ran a series of prompts to extract various insights:

      • Most expensive & cheapest buys

      • Team-wise expenditure

      • Count of unsold bowlers & their base prices

      • Types of unsold players

      • New players picked by Gujarat

      • Team-specific bar plots for spending on player types

      • Total money spent by all teams

      • Prediction query for Sam Curran’s 2024 buyer

      • Univariate & multivariate analysis

  5. Graphical Insights:

    • Used LLM-driven commands to auto-generate bar graphs showing team-wise spending

    • Visualized how much teams like Mumbai Indians and Gujarat Titans spent on different player categories

  6. Observations:

    • Pandas AI is highly effective for quick exploratory tasks, direct insights & basic plots

    • It struggles with complex analytics like multivariate analysis, outlier detection, or ambiguous queries

    • Response time is slower due to API latency

    • Still a great tool for beginners to interact with data using English instead of code


Key Learnings & Outcomes:

  • Understood the application of LLMs in structured data analysis

  • Learned how to perform insights & visualizations using prompts

  • Identified pros & cons of using Pandas AI for real-world datasets

  • Gained experience in handling sports data & auction-based analytics

  • Demonstrated how GenAI tools can assist in data science workflows

Step 1: Prerequisites

Before starting ensure that pandasAI and openai libraries are installed. Run the following command in your command prompt:


!pip install -q pandasai

Step 2: Importing necessary libraries


import pandas as pd
from pandasai import SmartDataframe
from pandasai.llm.openai import OpenAI

Step 3: Initializing an instance of OpenAI LLM and pass it’s API key


# replace "your_api_key" with your generated key
OPENAI_API_KEY = "your_api_key"
sdf = SmartDataframe(df, config={"llm": llm})

Step 4: Importing the IPL 2023 Auction dataset using pandas

We are using the IPL 2023 Auction dataset here. You can download dataset from kaggle.


df = pd.read_csv('IPL_Squad_2023_Auction_Dataset.csv')
print(df.shape)
df.head()

Step 6: Data Analysis using PandasAI

Now let’s begin our analysis:

Prompt 1:


sdf.chat(df, prompt="Which players are the most costliest buys?")

Prompt 2:


prompts = """
Which players were the cheapest buys this season and which team bought them?
"""
sdf.chat(df, prompt=prompts)

Prompt 3:


prompts = """
Draw a Bargraph showing How much money was spent by each team this season overall.
"""
sdf.chat(df, prompt=prompts)

Prompt 4:


sdf.chat(df, prompt="How many bowler remained unsold and what was their base price?")

Prompt 5:


sdf.chat(df, prompt="How many players remained unsold this season?")

Prompt 6:


sdf.chat(df, prompt="Which type of players were majorly unsold?")

Prompt 7:


sdf.chat(df, prompt="Who are three new players Gujrat picked?")                 

Promopt 8:


sdf.chat(df, prompt="What is total money spent by all teams in dollars?")               

Prompt 9:


prompts = """
draw a barplot showing 
how much money was spent by Mumbai Indians on all types of players?
"""
pandas_ai.run(df, prompt=prompts)               

Prompt 10:


prompts = """
draw a barplot showing how much money was spent by Gujrat on all types of players?
"""
pandas_ai.run(df, prompt=prompts)             

Prompt 11:


sdf.chat(df, prompt="Can you predict which team will buy Sam Curran in 2024?")           

Prompt 12:


sdf.chat(df, prompt="Perform univariate analysis")

Prompt 13:


sdf.chat(df, prompt="Perform multivariate analysis")          

For this input PandasAI seems to have failed as the complexity and ambiguity increased.

Pros of Pandas AI

  • Pandas AI works well on direct and well-explained inputs.
  • Can easily perform simple tasks like plotting graphs and univariate analysis.
  • Can perform basic statistical operations.
  • Also, can make basic predictions sometimes.

Cons of Pandas AI

  • Cannot process ambiguous inputs.
  • It uses server data, So, that is slower as compared to pandas.
  • Cannot perform complex tasks like outlier analysis or multivariate analysis.
 

©2025 All Rights Reserved PrimePoint Institute