# Real-Time AI Public Speaking Feedback (Gemma + MediaPipe) ## Metadata - **Published:** 4/10/2025 - **Duration:** 11 minutes - **YouTube URL:** https://youtube.com/watch?v=v7YJCDEfqaU - **Channel:** nerding.io ## Description ๐Ÿง  AI Starter Kit (Side Project): http://aifirebasestarter.dev/ In this project, we explore how to build a Web AI-powered public speaking coach that runs entirely in the browserโ€”using Gemma for LLM processing, MediaPipe for facial/body analysis, and a Bluetooth heart rate monitor for real-time biofeedback. What It Does: ๐Ÿ’ฌ Gives real-time feedback on your tone, and content ๐Ÿ“Š Tracks your heart rate while speaking to assess nervousness ๐Ÿ—ฃ๏ธ Uses Gemma to provide AI-generated tips, summaries, and practice prompts ๐Ÿ”‹ All client-side: runs in the browser with no data sent to servers ๐Ÿ” What Youโ€™ll Learn: โœ… How to connect a Bluetooth heart rate sensor in the browser โœ… How to use MediaPipe for tracking face, hands, and posture โœ… How to embed Gemma LLM locally using WebGPU โœ… How to combine these inputs into actionable, AI-generated coaching feedback โœ… Creating a privacy-first AI applicationโ€”no backend required! ๐Ÿ’ก Why This Is Cool You donโ€™t need an expensive coach or cloud APIs to get better at public speaking. With just a browser and a heart rate monitor, you can train with a real-time, personalized AI coach that respects your privacy. ๐ŸŽฏ Use Cases: - Public speaking practice - Interview preparation - Speech anxiety coaching - Real-time performance review ๐ŸŽฅ Watch the full demo and see it in action! ๐Ÿ”— Resources & Links: ๐Ÿง  AI Starter Kit (Side Project): http://aifirebasestarter.dev/ ๐Ÿ“ฉ Newsletter: https://sendfox.com/nerdingio ๐Ÿ“ž Book a Call: https://calendar.app.google/M1iU6X2x18metzDeA ๐Ÿ“Œ Chapters: 00:00 Intro 00:21 Project 01:20 Bluetooth 02:21 Demo 06:20 Code Letโ€™s Connect & Build Together ๐ŸŒ https://nerding.io ๐Ÿฆ Twitter: https://twitter.com/nerding_io ๐Ÿ’ผ LinkedIn: https://www.linkedin.com/in/jdfiscus/ ๐Ÿš€ Ever Efficient AI: https://everefficient.ai ๐Ÿ’ฌ What features would you add to an AI speech coach? Drop your ideas below! ๐Ÿ‘‡ ๐Ÿ‘ Like & Subscribe to keep up with cutting-edge local AI tools and creative builds! ## Key Highlights ### 1. Browser-Based LLM Public Speaking Coach The project demonstrates a public speaking coach built entirely in the browser using web AI, HTML, and JavaScript, invoking an LLM directly without backend dependencies. ### 2. Real-time Data Input: Voice, Video, Heart Rate The coach utilizes real-time voice and video feeds, combined with heart rate data from a Bluetooth-connected monitor, to provide comprehensive feedback. This provides rich data for analysis. ### 3. Gemma Model Caching for Fast Loading The Gemma 2 model is cached in the browser, enabling significantly faster loading times after the initial load, improving the user experience. ### 4. Bluetooth Integration for Biometric Feedback The project successfully integrates with a Bluetooth heart rate monitor to gather physiological data, showcasing the potential of incorporating biometric feedback into AI-powered coaching systems. ### 5. Experimenting with Facial Recognition Future experiments may explore facial recognition to add another layer of analysis and feedback to the public speaking coach. This will further enhance the quality of feedback. ## Summary ## Video Summary: Real-Time AI Public Speaking Feedback (Gemma + MediaPipe) **1. Executive Summary:** This video demonstrates a browser-based AI public speaking coach built using Gemma LLM, MediaPipe, and Bluetooth integration for real-time feedback on tone, content, and heart rate. The project emphasizes privacy by running entirely client-side with no data sent to servers, showcasing a powerful AI application achievable without cloud APIs. **2. Main Topics Covered:** * **Project Overview:** Explanation of the public speaking coach's functionality, combining voice, video, and heart rate data for AI-powered feedback. * **Bluetooth Integration:** Demonstration of connecting to a Bluetooth heart rate monitor in the browser to track nervousness and incorporate biometric data. * **Gemma LLM Implementation:** Explanation of embedding the Gemma 2 model locally using WebGPU and caching it for faster loading times. * **Real-time Data Analysis:** Combining voice, video, and heart rate input to generate actionable coaching feedback from the LLM. * **Code Walkthrough:** Brief overview of the codebase, including speech recognition, webcam access, Bluetooth connection, and LLM integration. * **Future Experiments:** Mention of potentially incorporating facial recognition to improve the analysis and feedback further. **3. Key Takeaways:** * It's possible to build a fully functional AI-powered application, specifically a public speaking coach, that runs entirely within the browser. * Combining LLMs (like Gemma) with sensor data (voice, video, heart rate) enables rich, real-time feedback and personalized coaching experiences. * Client-side processing ensures user privacy by eliminating the need for backend servers and data transmission. * Bluetooth integration is a viable option for gathering biometric data and using it to enhance AI-driven applications. * Gemma model caching significantly improves loading times and overall user experience. **4. Notable Quotes or Examples:** * "This is me doing public speaking and I'm a little nervous about it. So it's weird being on camera and talking about myself all at the same time." (Example of the speaker using the app and receiving feedback). * "We're going to invoke an LLM directly through the browser and nothing else." (Emphasizes the client-side nature of the project). * "[Caching the model] If I reload it's uh super fast right i now have in my debug. You saw how quick that was as before you could see it was actually loading the LM." (Highlights the performance benefit of model caching). * Discussion about leveraging facial recognition in the future to augment the feedback loop **5. Target Audience:** * Web developers interested in AI and LLMs. * Individuals seeking to build client-side AI applications. * Developers interested in integrating Bluetooth devices with web applications. * AI enthusiasts exploring privacy-focused AI solutions. * Anyone interested in the intersection of AI, public speaking, and biometric data. ## Full Transcript hey everyone welcome to Nering.io I'm JD and today what we're going to go through is how to build a public speaking coach specifically with web AI HTML JavaScript And so that means that we're going to invoke an LLM directly through the browser and nothing else With that let's go ahead and get started All right So a little explanation on what this experiment is So what I've done is I am going to take a realtime feed of my voice my video and then also connect to a heart rate monitor and take all of that information and as I'm speaking try and give me some coaching on uh on public speaking As you can see I say um and uh a lot I just thought this would be a fun kind of experiment to uh to to try completely in the browser You can see that I don't have the LLM stored just yet So let's go ahead and get that started and then I'll show you how to broadcast to from Bluetooth uh to a heart rate monitor So this is automatically going to start loading our model You can see that it's coming over here And as this is loading eventually when it gets 100% we're actually going to cache all of our information over here So while that's loading what I'm going to do is I'm actually going to mock out a heart rate monitor and then connect to it through Bluetooth So I have over here an Android phone which is going to uh set a heart rate monitor And so you can actually do uh an NRF uh device And what you need to do is you need to make it connectable and then you can add a service UU ID and you can actually just type in heart rate to see uh a heart rate and it'll actually give you the service name and then the preview of the information that's going to get sent Once you have all that information you can actually broadcast this out or advertise send out information as an advertisement But what we're going to do is we're just going to go ahead and set this And this is going to send a heart rate uh basically in a wave And so now that I have this you can actually see that my model has loaded down here Uh you can see the GPU model and we're going to go through what what how we loaded that and everything else But you can also see that now it's actually cached So if I reload it's uh super fast right i now have in my debug You saw how quick that was as before you could see it was actually loading the LM So now what we're going to do is we're actually going to connect to the Bluetooth And uh I don't know what loop is but it seems like somebody else has another heart rate monitor on here And the reason I know that is because we're only looked at that UYU ID device that we know is a heart rate monitor And as you can see we can we're requesting the Bluetooth We know what our device is We've got our GAT server and we are going to start monitoring here in a second So the way this will work is once we start the monitor we'll actually connect to the Bluetooth and then we can actually see what the beats per minute is and it's just going to go up uh in in basically a wave But we know that we're actually connected If you look you can actually see the client here So we know that we've connected from our advertiser to this device We're getting information from who's connected and we actually have our server information of what we're sending out And again this is our client So again these are fake numbers But uh so now what I want to do is I actually want to have a conversation and we will see the LLM and its output after the fact So we're just going to start talking and then we'll get an output Okay So this is me doing public speaking and I'm a little nervous about it So it's weird being on camera and talking about myself all at the same time And so now what we're doing is we're seeing our output We're actually performing the analysis and we're actually getting information back from our uh information I like how the confidence was still fairly high I guess even though I said uh I was nervous so definitely emotional but uh shows that I'm evident So we're going to kind of look at how I put all this together Again like this is the LLM that's directly in the browser This is a Bluetooth device that's actually getting pulled and then we're hitting the LLM directly in order to pull information back We could actually take in other inputs to provide real-time feedback into the LLM and have it hit consistently uh without the latency of the back end or anything else Real quick this is another project that I'm working on It's a Firebase and Nex.js starter kit that comes with AI powered apps And so what this allows you to do is get up and running with Firebase Nex.js and Genkit as well as pre-built AI components So some of the things that we focused on were the ability to actually start with uh AI first mindset and built-in prompt instructions so that you can actually build new features build new blog posts build documentation and actually integrate directly into a chatbot that is built with a chat interface as well as content generation and different prompts So if you sign up now there's actually a discount going on where you can get 90% off and this will fluctuate by uh this also comes with a social proof which allows you to do dynamic discounts Right now we're offering 90% off So definitely check it out and don't forget to like and subscribe So let's go ahead and take a look at the code real quick and see what we're looking at So again you can see there's only four files that are being served in this repo right i'm running it in this uh terminal and I basically have an app.js as well as some HTML in here so that I can have the elements that are on the screen as well as my own little debugger I don't even have the MP3 here um that we could add later for more effect but it kind of just makes you more nervous And then being able to analyze speech And so if we look at the uh the information here when we're doing the monitor uh the the two main things are we're actually taking the the LLM and we're resolving it as well as caching the file specifically So as we're going through this is all just our initializing of elements We have our initial state and this is kind of the the crust of it And so if you saw the LLM uh web AI memory game that I made it's very similar We're actually pulling in Gemma 2 We're doing the uh file resolver This uh GPU the model doesn't actually sit on the local machine it actually goes out and fetches it where this is where we're actually putting our file progress and showing that at first initialization and then what we're doing is when this file has been fully resolved we're actually able to start doing LLM inference and so again this is our data URL we have our maximum amount of tokens uh our temperature and things like that and this is where we're going to start doing our speech setup so This all this really is is just using again recognition straight from the uh the the browser So the other piece is we're going to start analyzing text So you can see if I don't have a heart rate that it is going to automatically stop from analyzing text because it wants the beats per minute as well as the text that's actually being sent Again there's way more information that we could try and pull This is just a very uh you know experimental example and then say okay I'm going to take all this information I'm going to send it in real time and I want it to generate a response um to analyze the performance Now there are some race conditions where you know potentially if it's trying to process and another one comes in uh it might get overloaded The next thing that we're going to do is we're actually going to clean and parse our JSON So we're expecting this information to be coming back but what's happening is it's going to be returned as a markdown most of the time So we want to clean all that up and we want to take that information and we're actually going to build out our HTML here and say this is uh the the final assessment Once we get through the analysis again we have our A-Ring some more This is our speech recognition We also have our webcam So again we're all all of this is just in the browser Uh the ability to use a debug panel So I'm basically just hijacking console log and then adding some styling so I know what's happening in the debug Uh but this is where the heart rate monitor is So we actually have the request for the device And you can see right here that we're filtering services that we're actually looking to explore And so that service that we set up with set up with the advertiser advertiser is specifically to a heart rate We can also look at other devices as well if we want to include them So based on that that's where we can actually tell the Bluetooth that we're only looking for these specific devices and that's why it's limiting our list Then what we want to do is actually connect to this service So you can see that we're actually trying to check check first if the device is working and then or is it gap and then we're actually trying to connect We're actually determining any device information that we can the manufacturer and then we're starting to log the information that is coming back from the heart rate monitor So again we're just taking all the measurements and broadcasting those changes out to JavaScript And so again this is our final output We can see based on the LLM that uh everything's coming back It's still tracking our heart rate It's gone down a lot again because of the wave It's um but and that's how we're sending real time information both voice as well as heartbeat through Bluetooth to an LLM completely through the browser Probably going to do some more experiments with this If you have any ideas I'm super open to them I think I might try and do something with uh facial recognition So stay tuned All right that's it for us today everyone So thanks for checking out the experiment of the public speaking web AI coach And with that happy nerding --- *Generated for LLM consumption from nerding.io video library*