# How to Re-evaluate Dynamic Datasets in LangSmith

## Metadata

- **Published:** 9/1/2023
- **Duration:** 12 minutes
- **YouTube URL:** https://youtube.com/watch?v=WyjjZy4pdjY
- **Channel:** nerding.io

## Description

In this video, I will guide you through the Lang Smith dashboard and monitoring features. We will be looking at the Langsmith cookbook and a testing example that involves dynamic data. I will walk you through the steps to clone or download the necessary files and set up the prerequisites. Then, we will start building our data set and define our Q&A system. We will use the OpenAI GPT API and execute our chain to get answers to specific questions. Finally, we will explore the chain of events and evaluate the correctness of our results. No action is required from you, but feel free to follow along and ask any questions you may have.

📰 FREE eBooks & News: https://sendfox.com/nerdingio
👉🏻  Ranked #1 Product of the Day:  https://www.producthunt.com/posts/ever-efficient-ai
📞 Book a Call: https://calendar.app.google/M1iU6X2x18metzDeA

🎥 Chapters
00:00 Introduction
01:00 Setting Up Prerequisites
02:48 Building the Data Set
03:13 Defining the Q&A System
03:46 Executing the Chain
04:18 Analyzing the Results
07:20 Run Evaluation 
09:22 Re-evaluate Dataset

🔗 Links
https://github.com/langchain-ai/langsmith-cookbook/tree/main
https://github.com/langchain-ai/langsmith-cookbook/blob/main/testing-examples/dynamic-data/testing_dynamic_data.ipynb
https://smith.langchain.com/

⤵️ Let's Connect
https://everefficient.ai
https://nerding.io
https://twitter.com/nerding_io
https://www.linkedin.com/in/jdfiscus/
https://www.linkedin.com/company/ever-efficient-ai/

## Key Highlights

### 1. Dynamic Data Testing with LangSmith

The video focuses on using LangSmith to test dynamic data, showcasing a workflow for using the Titanic dataset to answer questions using Python code snippets and DataFrame functions.

### 2. Tracing Agent Execution

LangSmith allows tracing the execution of the agent, detailing the steps taken, including token usage, latency, observations, and interactions with the OpenAI model and Python REPL tool.

### 3. Re-evaluating Data Sets

The video demonstrates how to re-evaluate datasets in LangSmith, even simulating new data being added. This allows for continuous monitoring and testing as the data evolves.

### 4. Custom Evaluation Criteria

The video showcases running evaluations based on custom criteria to assess the correctness of the agent's predictions, utilizing GPT-4 to score the accuracy of results derived from the Titanic dataset.


## Summary

## LangSmith Dynamic Data Re-evaluation: Video Summary

**1. Executive Summary:** This video provides a step-by-step walkthrough of using LangSmith to test and re-evaluate dynamic datasets, focusing on a practical example with the Titanic dataset. It demonstrates how to trace agent execution, evaluate results using custom criteria, and simulate dataset updates for continuous monitoring and testing.

**2. Main Topics Covered:**

*   **Dynamic Data Testing with LangSmith:** Utilizing LangSmith to test a Q&A system based on the Titanic dataset.
*   **Setting Up Prerequisites:** Cloning the LangSmith cookbook, installing required packages (pandas, OpenAI), and configuring API keys.
*   **Building the Data Set:** Loading questions and corresponding Python code snippets (DataFrame functions) for answering them.
*   **Defining the Q&A System:** Configuring the OpenAI GPT-4 API with a temperature of zero for accuracy and integrating a Python REPL tool.
*   **Executing the Chain:** Running the chain with a specific question and observing the generated answer.
*   **Analyzing Results and Tracing Agent Execution:** Examining the LangSmith dashboard to trace the execution flow, including token usage, latency, observations, and interactions with the OpenAI model and Python REPL.
*   **Running Evaluation:** Utilizing custom evaluation criteria to assess the correctness of the agent's predictions using GPT-4.
*   **Re-evaluating Data Sets:** Simulating new data additions to the Titanic dataset and re-running the evaluation to assess performance changes.

**3. Key Takeaways:**

*   LangSmith offers powerful tools for testing and evaluating AI models working with dynamic data.
*   Tracing agent execution provides detailed insights into the decision-making process, helping to identify areas for improvement.
*   Custom evaluation criteria allow for targeted assessment of model performance based on specific requirements.
*   Re-evaluation capabilities enable continuous monitoring and testing as datasets evolve, ensuring ongoing accuracy and reliability.
*   LangSmith allows for testing of code snippets within a data set.

**4. Notable Quotes or Examples:**

*   Example of using DataFrame functions with Python code snippets to answer questions about the Titanic dataset.
*   "We're actually using data frame functions to to figure out our answer and that's really cool because not only are we using Snippets from python but we can actually use things where it says even storing API requests and search arguments which is pretty awesome."
*   Description of tracing agent execution: "Not only does is it telling us things like our tokens and our latency but it'll actually go through how it's doing the observations."
*   Explanation of custom evaluation criteria: "We're looking for the custom criteria evaluation chain and looking at the prediction and the reference."

**5. Target Audience:**

*   AI/ML Engineers
*   Data Scientists
*   Developers working with LangChain and large language models
*   Individuals interested in using LangSmith for monitoring, evaluation, and debugging AI applications, especially those involving dynamic data.


## Full Transcript

hey everyone and welcome to nerding IO I'm JD and today we're going to be looking at Lane chain again specifically testing Dynamic data we'll be looking at the Lane Smith dashboard and monitoring how that works so before we jump in I just wanted to bring up ever efficient AI this is our AI automation agency just wanted to say a huge thanks we've been getting featured on multiple different AI tool directories so thank you again all right so let's dive into it so what we're going to be looking at is if you go to the langsmith cookbook they have a testing example and specifically this Dynamic data so you're going to want to go clone this or download this if you haven't been following along in the series please go back to the beginning and then that way you can installing Smith so what we're going to do is once we have this downloaded we're going to run a jupyter notebook and actually launch this that's what I have running in the background here and it'll take us right to our data set so again it has some prerequisites you need to have a lane chain API key you want to make sure that you have everything pip installed including pandas and your open AI key so what's interesting about this is we're going to take a data set this is the Titanic data set that's open source that you can just download here is a CSV but what's really cool is we're taking different questions and then we're actually using data frame functions to to figure out our answer and that's really cool because not only are we using Snippets from python but we can actually use things where it says even storing API requests and search arguments which is pretty awesome so all we need to do to get this running is we're just going to go ahead and start running our uh our each line item here so we'll go ahead and put our questions in it said that now we're creating a data set so I do want to point out one thing I always change this I make sure I add a dot m I like that better than than actually putting everything in the the command line and exporting these but whatever you like to do so I've added this that way it'll load a.m file the next what we'll do is we'll actually start building our data set so we can see that our data set is going to be this Dynamic Titanic CSV and we're just going to be creating it and then loading like looping through and loading each question as well as the code that we want to implement in order to get the answer for that question so let's go ahead and run this next we're going to Define our q a system so again we're going to be taking this Titanic path for the CSV we're going to go ahead and do a pandas read and then we're going to start importing the partials so when we're importing these partials we're actually using the API the open AI gpt4 with a temperature of zero zero which means that it's the most accurate and we want to make sure that when we're dealing with the data there's no room for flexibility so let's go ahead and continue and now that we have this information run what we're going to do is we will actually execute or invoke our chain we're going to say our input and we're going to ask it a question and then we're expecting it to give us this answer and in order to get this answer it'll actually run these python Snippets which again is really awesome so let's go ahead and invoke this and we'll see what we get all right so as you can see we got a new answer so it's still saying the the amount of passengers and so what we're going to do is we're going to look at how we can actually trace this so if we go ahead and open our lane chain and I'm just using the same project as last time so the tracing cookbook tutorial and you can see this execution is the latest execution that we have and so what we can do is look at the the chain of events here and this is really interesting as we go through not only does is it telling us things like our tokens and our latency but it'll actually go through how it's doing the observations so this is what it's observing the input this over here uh the number of passengers determined by counting the number of rows in the data frame so it's saying the action and then it's looking for the input the action input of the the python that we're going to run now we're going into our openai uh and we're defining what we want to look and we're saying that we are working in the pandas data frame in Python the data frame is uh function name is DF and then we're going to be using the tools to answer a question we're also using the python Rebel AST and that are we're trying to answer the question think about what we're going to do take an action and then uh the action input and then our observation it also has a result of the print and it's actually looking at the data set that we put in right which is really interesting so then we're actually looking at our tool which is another piece of Lane chain which is our our python a python Ripple AST and then we're looking at the input which is the function that we defined we have our output and then we're going back to our chain and it's saying the thought right this is it actually determining what its thought is the number of passengers it's action the function the other the new observation and then it's telling the output I know what the final answer and the final answer is there were 891 passengers on Titanic and so here is our final run you can also look at things like feedback if there is any and then the metadata that's associated with it cool so now what we're going to do is we're going to go back to our notebook and we can actually look through the Run evaluation so we're going to continue running our script we can just keep using the play button so what we're doing is we're looking for the custom criteria evaluation chain and looking at the prediction and the reference and so you can see the the balls right here so let's go ahead and start running this evaluation now we're in our configuration we're going to be looking for correctness we've got our llm which is gpt4 and then we're going to be running it on our data set all of this is which we defined above that the configuration for the data set on the Titanic CSV so let's go ahead and run this again and now it is running it is giving you a result for a public project um when I clicked it this link didn't actually work we'll give it a try though yeah so at this time it oh it does okay cool great so this is actually giving a evaluation so how many children under the age of 18 survived it's taking us through this entire chain which we just kind of looked at and we're able to actually see again the inputs that we are looking at so again this is a data frame python code snippet that we're actually running all right and so when the last thing that we can do is we can run a re-evaluate at a later time and so what this means is that it says to say that it it hasn't like any of the data hasn't changed but if we had more data coming in we'd be able to re-evaluate what that is or rerun reuse the existing data so we're going to go ahead and run this so again we are putting together some python Snippets so we're looking at the if the data tape we're basically mimicking the fact that we want extra data so we're kind of like mocking this data out by essentially duplicating the amount of Records then we're going to go ahead and change and then we will run our chain and then we'll look at our chain results as they run all right so now that this is run it's telling us to review the results again so if we go back to our lane chain what we're going to do this time is we're actually going to click this data button and this is what represents our data sets so right here we can say that or see that this is our Titanic csv2 we can see that we have the run from the previous run this is showing all the the runs and the correctiveness basically on this data set again it's taking us through our agent execution into our llms our tool and again showing us the in this case the amount of passengers on the Titanic based on the data frame so if we go back or let's just click our data set so if we go back to our data set we see a new test count we can again look at this you can see it's run about 10 times if we go back to our data set though and we actually look at the examples these are all the examples that we're pulling in we can actually see the test count of how many times this is run we could even add an example here based on our our input and our outputs and we can export or actually initiate a new test straight from langsmith so we don't actually need to do all of it within our data set so that takes us to the end of this video uh I what we went over today was building a dynamic data set with the Titanic CSV in langsmith and also went through how to trace and look at the execution of the agent as well as looking at the two the this python tool which allows us to actually run the python Snippets so hopefully it was that was helpful if you'd like us to do anything specific or even any questions please leave them in the comments below remember to like And subscribe and happy nerding

---

*Generated for LLM consumption from nerding.io video library*