Home  >  Article  >  Backend Development  >  Building a video insights generator using Gemini Flash

Building a video insights generator using Gemini Flash

DDD
DDDOriginal
2024-11-26 20:24:13742browse

Video understanding or video insights are crucial across various industries and applications due to their multifaceted benefits. They enhance content analysis and management by automatically generating metadata, categorizing content, and making videos more searchable. Moreover, video insights provide critical data that drive decision-making, enhance user experiences, and improve operational efficiencies across diverse sectors.

Google’s Gemini 1.5 model brings significant advancements to this field. Beyond its impressive improvements in language processing, this model can handle an enormous input context of up to 1 million tokens. To further its capabilities, Gemini 1.5 is trained as a multimodal model, natively processing text, images, audio, and video. This powerful combination of varied input types and extensive context size opens up new possibilities for processing long videos effectively.

In this article, we will dive into how Gemini 1.5 can be leveraged for generating valuable video insights, transforming the way we understand and utilize video content across different domains.

Getting Started

Table of contents

  • What is Gemini 1.5
  • Prerequisites
  • Installing dependencies
  • Setting up the Gemini API key
  • Setting up the environment variables
  • Importing the libraries
  • Initializing the project
  • Saving uploaded files
  • Generating insights from videos
  • Upload a video to the Files API
  • Get File
  • Response Generation
  • Delete File
  • Combining the stages
  • Creating the interface
  • Creating the streamlit app

What is Gemini 1.5

Google’s Gemini 1.5 represents a significant leap forward in AI performance and efficiency. Building upon extensive research and engineering innovations, this model features a new Mixture-of-Experts (MoE) architecture, enhancing both training and serving efficiency. Available in public preview, Gemini 1.5 Pro and 1.5 Flash offer an impressive 1 million token context window through Google AI Studio and Vertex AI.

Building a video insights generator using Gemini Flash

Google Gemini updates: Flash 1.5, Gemma 2 and Project Astra (blog.google)
The 1.5 Flash model, the newest addition to the Gemini family, is the fastest and most optimized for high-volume, high-frequency tasks. It is designed for cost-efficiency and excels in applications such as summarization, chat, image and video captioning, and extracting data from extensive documents and tables. With these advancements, Gemini 1.5 sets a new standard for performance and versatility in AI models.

Prerequisites

  • Python 3.9 (https://www.python.org/downloads)
  • google-generativeai
  • streamlit

Installing dependencies

  • Create and activate a virtual environment by executing the following command.
python -m venv venv
source venv/bin/activate #for ubuntu
venv/Scripts/activate #for windows
  • Install google-generativeai, streamlit, python-dotenv library using pip. Note that generativeai requires python 3.9 version to work.
pip install google-generativeai streamlit python-dotenv

Setting up the Gemini API key

To access the Gemini API and begin working with its functionalities, you can acquire a free Google API Key by registering with Google AI Studio. Google AI Studio, offered by Google, provides a user-friendly, visual-based interface for interacting with the Gemini API. Within Google AI Studio, you can seamlessly engage with Generative Models through its intuitive UI, and if desired, generate an API Token for enhanced control and customization.

Follow the steps to generate a Gemini API key:

  • To initiate the process, you can either click the link (https://aistudio.google.com/app) to be redirected to Google AI Studio or perform a quick search on Google to locate it.
  • Accept the terms of service and click on continue.
  • Click on Get API key link from the sidebar and Create API key in new project button to generate the key.
  • Copy the generated API key.

Building a video insights generator using Gemini Flash

Setting up the environment variables

Begin by creating a new folder for your project. Choose a name that reflects the purpose of your project.
Inside your new project folder, create a file named .env. This file will store your environment variables, including your Gemini API key.
Open the .env file and add the following code to specify your Gemini API key:

GOOGLE_API_KEY=AIzaSy......

Importing the libraries

To get started with your project and ensure you have all the necessary tools, you need to import several key libraries as follows.

import os
import time
import google.generativeai as genai
import streamlit as st
from dotenv import load_dotenv
  • google.generativeai as genai: Imports the Google Generative AI library for interacting with the Gemini API.
  • streamlit as st: Imports Streamlit for creating web apps.
  • from dotenv import load_dotenv: Loads environment variables from a .env file.

Initializing the project

To set up your project, you need to configure the API key and create a directory for temporary file storage for uploaded files.

Define the media folder and configure the Gemini API key by initializing the necessary settings. Add the following code to your script:

python -m venv venv
source venv/bin/activate #for ubuntu
venv/Scripts/activate #for windows

Saving uploaded files

To store uploaded files in the media folder and return their paths, define a method called save_uploaded_file and add the following code to it.

pip install google-generativeai streamlit python-dotenv

Generating insights from videos

Generating insights from videos involves several crucial stages, including uploading, processing, and response generation.

1. Upload a video to the Files API

The Gemini API directly accepts video file formats. The File API supports files up to 2GB in size and allows storage of up to 20GB per project. Uploaded files remain available for 2 days and cannot be downloaded from the API.

GOOGLE_API_KEY=AIzaSy......

2. Get File

After uploading a file, you can verify that the API has successfully received it by using the files.get method. This method allows you to view the files uploaded to the File API that are associated with the Cloud project linked to your API key. Only the file name and the URI are unique identifiers.

import os
import time
import google.generativeai as genai
import streamlit as st
from dotenv import load_dotenv

3. Response Generation

After the video has been uploaded, you can make GenerateContent requests that reference the File API URI.

MEDIA_FOLDER = 'medias'

def __init__():
    # Create the media directory if it doesn't exist
    if not os.path.exists(MEDIA_FOLDER):
        os.makedirs(MEDIA_FOLDER)

    # Load environment variables from the .env file
    load_dotenv()

    # Retrieve the API key from the environment variables
    api_key = os.getenv("GEMINI_API_KEY")

    # Configure the Gemini API with your API key
    genai.configure(api_key=api_key)

4. Delete File

Files are automatically deleted after 2 days or you can manually delete them using files.delete().

def save_uploaded_file(uploaded_file):
    """Save the uploaded file to the media folder and return the file path."""
    file_path = os.path.join(MEDIA_FOLDER, uploaded_file.name)
    with open(file_path, 'wb') as f:
        f.write(uploaded_file.read())
    return file_path

5. Combining the stages

Create a method called get_insights and add the following code to it. Instead print(), use streamlit write() method to see the messages on the website.

video_file = genai.upload_file(path=video_path)

Creating the interface

To streamline the process of uploading videos and generating insights within a Streamlit app, you can create a method named app. This method will provide an upload button, display the uploaded video, and generate insights from it.

import time

while video_file.state.name == "PROCESSING":
    print('Waiting for video to be processed.')
    time.sleep(10)
    video_file = genai.get_file(video_file.name)

if video_file.state.name == "FAILED":
  raise ValueError(video_file.state.name)

Creating the streamlit app

To create a complete and functional Streamlit application that allows users to upload videos and generate insights using the Gemini 1.5 Flash model, combine all the components into a single file named app.py.

Here is the final code:

# Create the prompt.
prompt = "Describe the video. Provides the insights from the video."

# Set the model to Gemini 1.5 Flash.
model = genai.GenerativeModel(model_name="models/gemini-1.5-flash")

# Make the LLM request.
print("Making LLM inference request...")
response = model.generate_content([prompt, video_file],
                                  request_options={"timeout": 600})
print(response.text)

Running the application

Execute the following code to run the application.

genai.delete_file(video_file.name)

You can open the link provided in the console to see the output.

Building a video insights generator using Gemini Flash

Thanks for reading this article !!

If you enjoyed this article, please click on the heart button ♥ and share to help others find it!

The full source code for this tutorial can be found here,

GitHub - codemaker2015/video-insights-generator

The above is the detailed content of Building a video insights generator using Gemini Flash. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn