Gemini 1.5 Pro: Steps to Analyze Audiovisuals & Craft Reviews

On April 9th, Google officially unveiled Gemini 1.5 Pro on its website, now available in over 180 countries/regions.

Apart from its ability to generate creative text and code, the standout feature of Gemini 1.5 Pro is its capability to deeply summarize uploaded videos and audio content based on user-entered text prompts, supporting up to 1 million tokens of context.

Currently, Gemini 1.5 Pro is available for free trial in the Google AI Studio development platform.

Furthermore, Google has optimized the Gemini API for performance, including system commands, JSON mode, and function call optimization, significantly enhancing the model’s stability and output capabilities.

5 Steps to Access Gemini 1.5 Pro (Free & No Waitlist)

How to / and Writing Film Reviews with Gemini 1.5 Pro?

I had the chance to experience the multimodal understanding capabilities of the latest Gemini 1.5 Pro through the Google AI Studio development platform. Here’s a simple guide on how to use it:

Log in to https://aistudio.google.com/app/prompts/new_chat and select the Gemini 1.5 Pro model, along with the Video feature.

Click on Video and choose Upload to upload a video.

Due to slow parsing speed for uploaded videos, I opted to use Google’s built-in video sample. It’s important to note that uploaded videos shouldn’t exceed 1 million tokens.

Using the built-in video sample, ask, “What is this film about?”
Gemini 1.5 Pro is parsing, typically taking only a few seconds to complete. The result is out: it’s a film called “Sherlock Jr.” starring and directed by Buster Keaton in 1924.
Continue by asking, “Can you write a 600-word review of this video?” After a short while, Gemini 1.5 Pro generates the review.

Although the generated content may not match that of top-tier reviewers, the overall article structure, narrative style, and vocabulary accuracy surpass that of many novice or intermediate reviewers. With slight modifications, it can become a great piece of content.

It’s worth mentioning that users can upload multiple videos simultaneously for interpretation, which is incredibly helpful for the video media industry, saving time in comprehending lengthy video content.

Let’s now try audio, with operations similar to video.

How to Understand Audio with Gemini 1.5 Pro?

Here, we’ll upload an English reading of an ESL Podcast course.

Then, upload the file in MP3 format.

Audio parsing is much faster than video; the audio we uploaded has approximately 120,000 tokens.

Begin by asking, “Summarize the content of this audio.”
Gemini 1.5 Pro has accurately interpreted it; the audio is the first lesson of the ESL Podcast series “A Day in the Life of Jeff,” aimed at helping learners grasp everyday English vocabulary.

Surprisingly, Gemini 1.5 Pro has interpreted the entire structure, story content, and learning objectives, indicating its strong understanding of English data content.

Gemini 1.5 Pro’s audio understanding also supports interpreting multiple files together.

Gemini API Enhancements

To empower developers in controlling the Gemini model better, Google has made three optimizations to the API.

System Commands: Users can now use system command functionality in both Google AI Studio and Gemini API to guide the model’s response output, enabling users to control the model’s behavior based on specific needs and use cases.
When setting up system commands, users need to provide additional context to the model to understand tasks and provide a higher degree of customization in responses, following specific guidelines throughout the interaction between the user and the model.
Developers, through system commands, can define roles, formats, objectives, and rules to guide the model in various behaviors within specific use cases.

JSON Mode: Gemini API now provides a configuration parameter to request JSON format responses, helping developers extract structured data from text or images.
Function Call Optimization: Developers can use custom functions and provide them to the AI model, which won’t directly call these functions but generate structured data output for suggested function names and parameters.

This output supports calling external APIs, and the generated API output can be reintegrated into the model, assisting developers in achieving more comprehensive query responses.

Gemini 1.5 Pro is open to everyone. Come test it out and see what it can do!

>> Further Reading:
8 Key Upgrades in GPT-4 Turbo Release Revealed
GPT-4 Turbo Goes Official! Explore GPT-4 Models
4 Steps to Use Free GPT-4 Turbo with Copilot
OpenAI Unveils Voice Engine
Demystifying Red Teaming: Top 10 Questions Answered!
5-Minute Read: Global AI News Roundup

Gemini 1.5 Pro Free Trial: Analyzing Audiovisuals and Crafting Reviews Made Easy

How to / and Writing Film Reviews with Gemini 1.5 Pro?

How to Understand Audio with Gemini 1.5 Pro?

Gemini API Enhancements

Leave a Comment Cancel Reply

How to / and Writing Film Reviews with Gemini 1.5 Pro?

How to Understand Audio with Gemini 1.5 Pro?

Gemini API Enhancements

Share Your Love

Leave a Comment Cancel Reply