Databricks Assistant Year in Review

dezembro 20, 2024

21

Since its launch in 2023, Databricks Assistant has grown to hundreds of thousands of monthly users, including developers at major enterprises like Rivian, SiriusXM, and Morgan Stanley. Our context-aware AI assistant, available natively within Databricks, allows users to query data, explain complex logic, and automatically fix errors exclusively using natural language.

Databricks Assistant is an agentic system that leverages multiple AI models, data and tools to provide accurate and contextual results, based on the semantics of your data and usage patterns. In the last year, we’ve introduced many new features and improvements to the Databricks Assistant. Let’s take a look at some of the highlights and show you what’s coming next in 2025.

Assistant Autocomplete

Assistant Autocomplete helps users write code faster and with greater accuracy by providing context-aware suggestions as they type. Since its launch, we’ve introduced several technical enhancements to improve its accuracy and usability. These include personalized code retrieval and multi-line completions. We’ve also enhanced context evaluation and ranking to better account for neighboring cells, tables, and variables, ensuring suggestions are more relevant. Finally, we’ve increased our character limit, enabling it to generate longer and more complete code suggestions, while refining truncation mechanisms to display full lines of code more consistently.

auto

“While I’m generally a bit of a GenAI skeptic, I’ve found that the Databricks Assistant Autocomplete tool is one of the very few actually great use cases for the technology. It is generally fast and accurate enough to save me a meaningful number of keystrokes, allowing me to focus more fully on the reasoning task at hand instead of typing. Additionally, it has almost entirely replaced my regular trips to the internet for boilerplate-like API syntax (e.g. plot annotation, etc).” – Jonas Powell, Staff Data Scientist, Rivian

Error Diagnosis and Quick Fixes

This year, we enhanced our most popular use case—diagnosing code errors—by introducing Assistant Quick Fix. Focusing on the most common error types, such as syntax issues and misspelled table or column names, the Assistant now automatically generates single-line correction suggestions in just 1-3 seconds.

quick

“One of the best things about Databricks Assistant is how it can automatically document your tables. A pop-up offers assistance with an error, and nine times out of 10, you click ‘yes,’ and the assistant makes everything perfect with the click of that button. So, that alone has made things significantly easier and more productive.” — Andy Featherstone, Manager of Data Engineering, RDSolutions

Diagnosing Job Errors

Databricks Assistant now offers the ability to directly diagnose errors from the Workflows page. To start, we specifically focused on authoring-related job errors within notebooks. In the future, we’ll also add support for other common types of job errors, such as misconfigured job parameters, cluster-related issues like out-of-memory errors, task-level failures within job runs, and downstream impact analysis to understand how a failure affects dependent jobs or data consumers.

Visualization and Dashboard Creation

Databricks Assistant has simplified the process of creating visualizations and dashboards, enabling users to quickly transform raw data into meaningful insights. This feature has been particularly valuable for presenting complex data in easily digestible formats.

Enhanced Security and Privacy

In response to growing data privacy concerns, Databricks introduced an exclusively Databricks-hosted Assistant in late 2024 on AWS and Azure. This version ensures that all data processing remains within the Databricks account, leveraging Databricks-hosted models and the secure infrastructure that powers Databricks Model Serving. We plan to expand support to include both inline and side panel chat in the future.

Threads and conversation management

Databricks Assistant utilizes a thread-based system for managing conversations, allowing users to create and resume multiple discussion threads across different contexts within the Databricks Platform. The Assistant leverages conversation history to provide contextual responses, enabling users to refine or build upon previous interactions without rewriting entire prompts. Ongoing conversations with the Assistant also include citations to Databricks docs when applicable and dividers with links to relevant reference objects and pages.

Assistant Usage Logs

Admins and managers can now track Assistant adoption and engagement with the newly introduced Assistant system table (system.access.assistant_events). Each row in this table logs user interactions with the side panel or inline chat.

We’ve created a custom sample dashboard that allows you to visualize key information quickly. This dashboard provides insights on active users by day and month, active users per workspace, top users overall, and submissions data both per workspace and in total.

“The introduction of Databricks Assistant has truly impressed me. I no longer have to write code. What used to take me one hour to write I did in five minutes. From the advanced users to the basic users at Corning, everyone is amazed by the immediate impact,” – Jibreal Hamenoo, Principal System Engineer, Data Engineering, Corning Incorporated

Catalog Explorer Integration

The integration of Catalog Explorer with Databricks Assistant enhances the functionality and accuracy of the AI-powered assistant. This integration leverages the rich metadata and context provided by Catalog Explorer to deliver more relevant and personalized responses.

We’ve introduced new agents to deliver detailed information on table lineages and insights. Users can invoke these agents with commands like /getTableLineages to view upstream and downstream dependencies or /getTableInsights to access metadata-driven insights, such as user activity and query patterns. This enables the Assistant to answer questions like “show me downstream lineages” or “who queries this table most often.”

Improve SQL Efficiency

Leverage syntax highlights warnings and the /optimize command to improve inefficient SQL queries. Recommendations pop up in real-time, helping you quickly identify issues such as missing partition keys, inefficient WHERE clause filters, high cardinality GROUP BY operations, or costly joins using STRING data types.

Improved Assistant Accuracy and Reliability

This year, we introduced key updates to enhance the quality and reliability of the Databricks Assistant. Table search accuracy was improved to handle queries more effectively, even without exact matches. Additionally, we expanded documentation retrieval, now influencing around 45% of all Assistant interactions, to ensure up-to-date responses from Databricks, MLFlow, Spark, and Delta documentation.

We also improved support for Delta Live Tables by introducing heuristics to detect DLT-related queries and trigger tailored responses. These responses include targeted documentation and instructions on topics like ingestion, observability, and version control, increasing helpfulness from 12% to 40%.

What’s coming next

We’re dedicated to making the Databricks Assistant smarter, more intuitive, and more personalized to your needs. Here’s a preview of what you can expect:

Flexible Code Execution: Code execution will be available in the side panel across various pages, including the Catalog Explorer. This allows seamless code running without context switching while preserving chat history for easy reference. Users can now effortlessly execute code and access previous conversations, streamlining workflow and boosting productivity.

Quick Fix Improvements: We’re introducing personalized code retrieval, leveraging snippets from successful cell executions and viewed code to provide more relevant suggestions. Additionally, we’re updating our triggering logic to include more error types. Finally, we’re exploring consecutive, multi-line suggestions.

Targeted Edits for Large Cells: We’re working on generating more precise code changes instead of replacing entire blocks, improving performance and usability for cells with over 20-30 lines.

Get Started

Use the Databricks Assistant today to describe your task in natural language and let the Assistant generate SQL queries, explain complex code and automatically fix errors. We are excited to see what Data and AI projects you will build with the help of the Assistant. Start using the assistant by finding the Assistant icon in your Databricks environment.

Check out our product page see the Databricks Assistant in action, or read the documentation for more information on all the features.

Previous articleWhat Went Wrong This Year?

Next articleStart Your Journey into the Future with Cisco’s AI Solutions Learning Path

Databricks Assistant Year in Review

Assistant Autocomplete

Error Diagnosis and Quick Fixes

Diagnosing Job Errors

Visualization and Dashboard Creation

Enhanced Security and Privacy

Threads and conversation management

Assistant Usage Logs

Catalog Explorer Integration

Improve SQL Efficiency

Improved Assistant Accuracy and Reliability

What’s coming next

Get Started

Browser-Based XGBoost: Train Models Easily Online

Databricks One Reimagines How Enterprises Work with Data and AI

Meta’s new world model lets robots manipulate objects in environments they’ve never encountered before

Most Popular

Network Operations for the AI Age

Securing AI with Steve Wilson – O’Reilly

Nanogrid drug delivery systems developed for precise lung inflammation treatment

Europe builds AI infrastructure with NVIDIA to fuel region’s next industrial transformation

Recent Comments

ABOUT US

POPULAR POSTS

Network Operations for the AI Age

Securing AI with Steve Wilson – O’Reilly

Nanogrid drug delivery systems developed for precise lung inflammation treatment

POPULAR CATEGORY