# Streaming Live With Digital Avatars Using Next.js ## Metadata - **Published:** 7/15/2024 - **Duration:** 15 minutes - **YouTube URL:** https://youtube.com/watch?v=3e5Fixl_nQE - **Channel:** nerding.io ## Description Explore the future of digital avatars with our comprehensive tutorial on live streaming digital avatars using Next.js! In this video, we'll walk you through the HeyGen Official Streaming Avatar Next.js Demo, showcasing how you can implement and customize your own digital avatars for live streaming applications. ๐Ÿ“Œ Key Highlights: - Setting up the Next.js environment for avatar streaming - Integrating the HeyGen Streaming Avatar library - Customizing avatar interactions and appearances - Real-time avatar rendering and performance optimization Whether you're a developer looking to enhance your live streaming capabilities or a tech enthusiast curious about the latest in digital avatar technology, this tutorial has something for everyone. Don't forget to like, comment, and subscribe for more cutting-edge tech tutorials! ๐Ÿ“ฐ News & Resources: https://sendfox.com/nerdingio ๐Ÿ“ž Book a Call: https://calendar.app.google/M1iU6X2x18metzDeA ๐ŸŽฅ Chapters 00:00 Introduction ๐Ÿ”— Links https://www.heygen.com/ https://github.com/HeyGen-Official/StreamingAvatarNextJSDemo/tree/main https://huggingface.co/docs/transformers.js/main/en/api/pipelines#module_pipelines.AutomaticSpeechRecognitionPipeline โคต๏ธ Let's Connect https://everefficient.ai https://nerding.io https://twitter.com/nerding_io https://www.linkedin.com/in/jdfiscus/ https://www.linkedin.com/company/ever-efficient-ai/ ## Key Highlights ### 1. Haen: AI-Powered Video Creation at Scale Haen offers dynamic digital avatars and video creation tools, suitable for automated video generation based on CRM data or other triggers. ### 2. Next.js Streaming Avatar Demo The video explores a Next.js demo using Haen to stream digital avatars, showcasing real-time text interaction and voice input. ### 3. Security Concerns with OpenAI Key Exposure The demo exposes the OpenAI API key in the browser, posing a security risk. Alternatives like server-side transcription or Transformer.js are suggested for production environments. ### 4. Vercel AI SDK for Streaming The project uses Vercel's AI SDK and `useChat` hook for real-time streaming between the client and server, enabling interactive conversations with the avatar. ### 5. Potential Applications: Customer GPTs The streaming avatar implementation opens possibilities for creating interactive customer GPTs, offering assistance or representing a company in a personalized way. ## Summary ## Video Summary: Streaming Live With Digital Avatars Using Next.js **1. Executive Summary:** This video explores the HeyGen Streaming Avatar Next.js Demo, demonstrating how to implement and customize digital avatars for live streaming applications. It covers setting up the Next.js environment, integrating the HeyGen library, customizing avatar interactions, and highlights potential security concerns with API key exposure, suggesting alternative solutions. **2. Main Topics Covered:** * **Introduction to HeyGen:** AI-powered video creation at scale using dynamic digital avatars. * **Setting up the HeyGen Streaming Avatar Next.js Demo:** Cloning the repository, installing dependencies, and configuring API keys (HeyGen and OpenAI). * **Security Considerations:** Risks associated with exposing the OpenAI API key in the browser and alternative solutions like server-side transcription or Transformer.js. * **Using the Vercel AI SDK:** Utilizing the `useChat` hook for real-time streaming between the client and server for interactive avatar conversations. * **Live Avatar Interaction:** Demonstrating real-time text input, voice input (with a discussion of in-browser transcription vs. server-side or TransformerJS), avatar responses, and session management. * **Potential Applications:** Discussing the possibility of creating interactive customer GPTs powered by digital avatars. * **Code Walkthrough:** Reviewing the relevant code snippets in the Demo, covering areas like token generation, session management, media stream handling, and the integration of `useChat`. **3. Key Takeaways:** * HeyGen offers a platform for creating and streaming digital avatars, which can be integrated into Next.js applications. * The provided Next.js demo allows for real-time interaction with the avatar through text and voice, showcasing a practical implementation. * Security is paramount; exposing API keys client-side is a significant risk that needs to be addressed in production environments. Employing server-side transcription using open-source models like Transformer JS is a viable alternative. * The Vercel AI SDK's `useChat` hook streamlines the process of managing streaming conversations between the client and server. * Streaming avatars open up opportunities for creating engaging customer service experiences, personalized assistants, and other interactive applications. **4. Notable Quotes or Examples:** * "They [Haen] tout it as AI-powered video creation at scale... they're actually creating digital avatars and videos... that you can then uh change dynamically." * "This [OpenAI API Key in the browser] is a little concerning uh but it's a demo so it is what it is but if you were going to put this in production you definitely want to do something different uh you don't want to expose your opening I ke very easy to figure that out..." * "...you could actually create a conversation back and forth in the browser, right? So you could send information... imagine that this simple chat right here is actually a uh an assistant or a customer GPT..." **5. Target Audience:** * Developers interested in integrating digital avatars into web applications. * Tech enthusiasts exploring the use of AI and live streaming technologies. * Next.js developers looking for practical examples of using the Vercel AI SDK. * Individuals considering using HeyGen or similar services for avatar-driven video creation. * Anyone interested in customer GPTs and AI-driven customer experiences. ## Full Transcript hey everyone welcome to nerding IO I'm JD and today we're going to be taking a look at haen and specifically the feature to stream digital avatars in nextjs and with that let's go ahead and get started all right so the first thing we're just going to kind of look at what haen is and basically they uh tout it as AI powered video creation at scale so what they're doing is they're they're actually creating digital avatars and videos um that you can then uh change dynamically so you can kind of see like some of the the information here is like if you wanted to do a automatic uh video when like the someone submits through like a CRM or something um I got interested in them for a little bit for the marketing but then also when they started doing streaming which is what we're going to dig into today and we'll kind of look through an example that they uh have come out with with nextjs um they aren't sponsoring this video or anything I just uh like the service and thought it was cool so um the first thing is when you you need to log in again you can actually build your own avatar that give you like one free one we're just going to use uh the default ones that they have um and once you're in here again you know they have a lot of different things that you can do as far as creating different avatars that you can build dynamically they have a studio they even have like URL to video translations all kinds of stuff but we're going to be going into uh getting our API key because that's the first thing that we need so if we go in here and we come down to space settings and we have our API uh you need a trial token um and otherwise you need an Enterprise license and so what we're going to do is if you look at the repo they actually go through what this means basically the difference between a trial token and an Enterprise token is uh that when you're streaming you can only have um like three open and it closes after uh 10 minutes so each trial has a three concurrent streaming sessions and closes after 10 minutes so there's some limitations obviously if you want scale otherwise you need to get uh an Enterprise token but to play with it today uh you can get started for free so once we have our token you just go in uh there's another button that says activate then you just have to copy it now what we're going to do is we're actually going to pull down the repo and get started into their demonstration here which is is pretty slick so first thing go ahead and copy this code same as you would anything else just clone it pull it down we'll have the link in the uh description so I've already pulled this down and done an MPN stall and uh just wanted to show you like the things that you'll need in order to get started with this so first you're going to need that API key that uh we just downloaded or copied so put it here you're going to need an open AI key and you also need an open AI key for the next JS public um this is a little concerning uh but it's a demo so it is what it is but if you were going to put this in production you definitely want to do something different uh you don't want to expose your opening I ke very easy to figure that out even here it has Dangerously in browser so we want to avoid that Al together um a way around that is we'll actually go and look and see what it's doing so first we are going to be fetching our access token but if we look for where uh this is actually using a open AI in the browser or open AI yeah in the browser let's just go ahead and find it um basically right here so we have this is doing trans description um you know another way to do this would be to uh take this wave file and send it to your back end and transcribe it transcribe it on the Fly um another option if you subscribe to this channel would be to go take a look at uh Transformer JS we have a bunch of videos on that um used in in a lot of projects they actually have a uh pipeline in the browser so you can actually load in whisper without ever having to make an API call it's all free these are open- Source um models that you can actually just load in in this example they're using a wave file but you can actually do uh real time transcription on the Fly you can see it's recognizing my voice the load time for the model is pretty simple and it just keeps uh grabbing the the what I'm saying so that's that's how I would do this if I was doing this in in a production example but for now what we're going to do is just kind of go through and see what it's doing so if we uh go ahead and run our npm run uh Dev what we can then do is go to our local host and actually go through their demo so there's uh like I was talking about earlier you can actually make your own custom avatar so uh it's pretty simple to do I would definitely encourage you to do it um when you go into here there's this ability to create an instant Avatar hey guys will take you through the steps of actually building your own but we're going to use the demo for now so what we can do is we're just going to select Edward in a blue shirt you can see it's grabbing the the custom avatar ID so if you wanted to do your own you just have to put in your own uh Avatar ID same thing with a voice um so we're going to use Paul and then we're going to start this session so what this is doing is it's actually creating a uh a web stream a web socket um so you're like streaming information back and forth you can see that basically has an intro for this digital Avatar and then you can text back and forth so we're just going to say hey there and we'll send hey there you can see the information coming back of when it's talking and now we're just going to say hey there just showing some examples on hey Jen and how you work we'll just go ahead and we can send that hello I'd be happy to help you with that if you have specific questions or need examples of how I can assist you feel free to ask whether it's providing information solving problems or generating creative content I'm here to help all right so you interrupt the task uh to kind of like start over or since this is keeping a constant connection we can end the session remember this is because we're doing a trial it's only going to be uh a 10-minute session that's allowed we just going to go ahead and end that for now and then actually go through and look at the code and see what's going on here so if we come down to the video we're seeing that this is actually going to be a video stream and we're going to figure out what this video stream object is we're seeing our buttons where we can uh start or interrupt that's what we just did or end the session or interrupt um this is basically showing us our loading screen so we're just going to minimize that and this is where we're starting to see the information for the Avatar whether we're sending information or we're actually using the recording which is this button so let's start digging into the HTML and just see what's actually happening here so really what it's doing is is it's just taking this information and uh doing a key down or uh a submit and triggering it back in the main Avatar component so if we continue looking we already looked at like the transcribe here is basically where we could uh send the the transcription to open AI but we could also use Transformer JS we have our start recorder so they're actually just using the media recorder which we've seen a few examples where we're actually just uh recording natively in the browser and then what they're doing is they're actually taking that chunk and creating an audio file that audio file is then stored and it's the the same way that we're actually taking that blob uh from previous this audio blob that we just recorded and sending it this is basically just our our file name that we're just making up on the fly so as we keep going through when we have our our speak and we have our end session function let's try and find where the what happens once we actually get the token so right here is where we have our start session up here is our our token so it calls out it takes our API key and calls out to get an access token to then allow us to start a session so you basically can think of each session gets a token um and then what we're going to be doing is we're going to be passing all this information for our uh Avatar to start and then right here what's being returned back is our response data and our Avatar media stream which is where we're actually getting the information from the uh the the media stream real quick everyone if you haven't already please remember to like And subscribe it helps more than you know with that let's get back to it all right so now let's just take a look at what is actually happening in the back end so if we look at this uh access token we can see here that we're first checking to see if our environment variables in there then we're actually creating our token and we're just getting that information back and uh responding with with a new essentially session token is the way to think about it and when we're sending information of either what's been recorded or uh when we type things in if we want to get information back of streaming text from them so what we're saying is our you know we have our Max duration here but we're allowing streaming of of the information so we're actually using the llm to go out and do the communication right this is where we're looking at our messages the same as we would a chat but we're uh we're we're kind of like streaming that information back and forth how is this actually working so we have our used stream this is actually coming from the versel S uh AI SDK and we're piping that to our AI stream response so even though this is in our API uh chat route what's happening is when you come in here and we first fetch our access token and then we've created our Avatar as we are uh handling the chat itself so we're like submitting information back and forth we're doing it with this use chat so the first thing we're going to see is if we've initialized the uh the um the application and then we're also going to see if the Avatar is current if the Avatar is current we're going to speak uh We've also set our initial messages so we're pulling from the event of speak and doing uh our messages so we have our session our message content and then our message so it's actually going out uh this is the on finish for the use chat but it's going out getting information pulling it back so what we're going to do is we're going to actually look at this in the back end and see exactly what's going on so we go back to the browser we'll just uh I think it's maintaining state of what we had last time let's try and start a session cool actually let's end this session we'll get our Network tab up here and let's go ahead and start so as you can see we're uh getting our token our access token and then we are starting our stream and I want to see if there's any web sockets there's not okay interesting so then what we'll do is we'll go ahead and just have them say hey there again and we'll watch what's going on we have our task we've got our information coming back do we have anything uh and now we'll do the same but we're going to check with and we can see we've hit our I'm here and ready to chat streaming the information to it and then it's streaming the information back so this chat uh uh API call that we're making you can actually see it in the headers right is this route that we used and we're doing that with the versel AI SDK and so what's really interesting about this is you could actually create a conversation back and forth in the browser right so you could send information imagine that this uh simple chat right here is actually a uh an assistant or a customer GPT um and you're actually having a conversation with a digital Avatar that could be a representation from your yourself your company um we've seen examples with like Peter levels from uh doing uh like therapy in the browser um so there's a lot of different things that you could do now that you have this implementation right inside of a next demo you just have to create the interaction back and forth um so that's why think that this is really cool it's definitely something that I'm going to keep playing with and uh it wasn't too difficult to set up all right that's it for us today everyone so what we went through were some of the basic features of haen as well as how to set up a streaming Avatar in nextjs if you haven't already please remember to like And subscribe and with that happy nerding --- *Generated for LLM consumption from nerding.io video library*