# Steps to Create Music With Transformer.js ## Metadata - **Published:** 4/26/2024 - **Duration:** 11 minutes - **YouTube URL:** https://youtube.com/watch?v=-c0WnXylgSw - **Channel:** nerding.io ## Description ๐ŸŽต Create AI-generated music in your browser with Transformer JS! ๐ŸŽง In this video, we explore the power of Transformer.js for generating unique musical compositions. You'll learn how to: - Set up Transformer JS - Generate and customize melodies, harmonies, and rhythms - Integrate AI-generated music into your web apps Whether you're a music enthusiast or a web developer, discover the future of music creation with TransformerJS. ๐Ÿš€ Course: https://forms.gle/PN3RpKD8eHsicL9Z7 ๐Ÿ“ฐ News & Resources: https://sendfox.com/nerdingio ๐Ÿ“ž Book a Call: https://calendar.app.google/M1iU6X2x18metzDeA ๐ŸŽฅ Chapters 00:00 Introduction 00:20 Demo 03:07 Model 04:48 Code 08:03 Generate 10:16 Final ๐Ÿ”— Links https://huggingface.co/spaces/Xenova/musicgen-web https://huggingface.co/Xenova/musicgen-small https://github.com/xenova/transformers.js/tree/v3 โคต๏ธ Let's Connect https://everefficient.ai https://nerding.io https://twitter.com/nerding_io https://www.linkedin.com/in/jdfiscus/ https://www.linkedin.com/company/ever-efficient-ai/ ## Key Highlights ### 1. Music Generation in the Browser The video demonstrates how to generate music directly in the browser using Transformer.js, eliminating backend costs and dependencies. ### 2. Leveraging Pre-trained Models The tutorial utilizes the MusicGen small model from Hugging Face, showcasing how to download and implement a pre-trained model for music generation. ### 3. Client-Side Processing with WASM The process relies on WebAssembly (WASM) for efficient client-side computation. WebGPU is mentioned as a future technology that could improve latency. ### 4. Implementation in JavaScript The video walks through the JavaScript code needed to load the model, process the prompt, and generate the music file, even generating a .wav file with 'wavefile' npm package. ### 5. Adapting Hugging Face Spaces code Although the HF Spaces code is compiled, the presenter reverse engineers, shows how to clone the repo, and adapt the code to generate music in a local javascript setting. ## Summary Here's a comprehensive summary document for the "Steps to Create Music With Transformer.js" video: **1. Executive Summary** This video tutorial demonstrates how to generate AI music directly in the browser using Transformer.js, specifically leveraging the MusicGen small model from Hugging Face. It covers setting up the environment, adapting the pre-trained model for use in JavaScript, and generating a .wav music file client-side. **2. Main Topics Covered** * **Introduction to Transformer.js for Music Generation:** Overview of using Transformer.js for creating music in the browser. * **Demo Walkthrough:** Exploring the MusicGen demo on Hugging Face Spaces. * **Model Explanation:** Details about the MusicGen small model and its implementation. * **Code Implementation:** A step-by-step guide on setting up a JavaScript environment (using npm), importing necessary libraries (transformers.js, wavefile), and writing the code to load the model, process the prompt, and generate music. * **Generating Music:** Running the JavaScript code to generate a .wav file and testing the output. * **Adapting Hugging Face Spaces code:** How to clone the repo, reverse engineer, and adapt the HF Spaces code for use in a local Javascript setting. **3. Key Takeaways** * **Client-Side Music Generation:** Transformer.js enables music creation within the browser, eliminating backend costs and latency. * **Pre-trained Models:** Utilizes Hugging Face's MusicGen small model for easy integration into projects. * **WebAssembly (WASM) Power:** Emphasizes the role of WASM for efficient client-side processing. Future technology, WebGPU, could significantly improve latency. * **Practical Implementation:** Provides a clear code example to download, set up, and use Transformer.js for music generation. * **Reverse Engineering a Compiled Implementation:** The presenter explains how to take the compiled code from a Hugging Face space and adapt it for local use. **4. Notable Quotes or Examples** * "The fact that this is happening in the browser and happening for free is super cool." * Explanation of adapting the HuggingFace implementation: "Although the HF Spaces code is compiled, the presenter reverse engineers, shows how to clone the repo, and adapt the code to generate music in a local javascript setting." * Code examples provided throughout the video show how to import libraries, load the model, and generate the .wav file. **5. Target Audience** * Web developers interested in integrating AI-generated music into their applications. * Music enthusiasts curious about the capabilities of AI in music creation. * Individuals familiar with JavaScript, npm, and basic machine learning concepts. ## Full Transcript hey everyone welcome to nerding iio I'm JD and today we're going to be taking a look at music generation but specifically doing it in the browser with Transformer JS so first we're going to take a look at the demo that they put together and then we'll actually learn how to implement it into something like JS with that let's go ahead and get started all right so the first thing we're going to do is we're going to actually to take a look at the music gen uh implementation with Transformer JS and actually just see the demo that they put together so this is on their hugging face spaces and they released a post about this a few days ago um might even been a couple weeks ago but basically the the way that this works is you you have a series of buttons you can pick your duration and guidance and you can actually just kind of type in a prompt of what kind of music you want to generate so the first thing to note is that this is already loaded if I refresh it takes it's really quick to load the models um it may be a little slower when you do it the first time the second time it's it's caching so it's it's pretty quick you can actually see that in the uh Network tab when we go in and we actually look at WM right here so we'll go ahead and do a refresh again and we'll just see it being loaded and this is our the the Onyx runtime that it's loading in and so now what we're going to do is we're actually just just going to have it generate by default and we'll see uh how long this takes all right and so now it's finished and we're going it's uh finishing generating right here and we're going to uh test it out just note that uh I paused the video it probably took about a minute to maybe a minute and 30 seconds to generate about 10 seconds of audio um and then and there we have it we have our 10 seconds of audio for our uh music generation again this is all being done for free you're loading the model uh into the the front end so it doesn't cost you anything to go uh hit any kind of Open Source it doesn't cost you anything for the back end you're literally doing all this in the client um so there is some initial load time but the uh the fact that it's it's able to be done the client is really impressive again it's using wasm you of an upcoming video we'll probably go through web GPU which is a way that we're seeing like these uh latency um kind of becoming a little more obsolete they're they're way way faster so um again the fact that this is happening in the browser and happening for free is super cool so now what we're going to do is we're going to try and break this down and figure out how did they actually put this together one of the things to note is if you go and look at the uh the files here um they're compiled to they're compiled down so you can't really see the the code too much uh but what we can do is we can actually look at the model that they used and figured out an example of how to do it so if you come in here to the activity page and you scroll down to uh the models section you can find the music gen small and what we're going to do is we're going to take an take this model and we're actually going to build out an example so if we go in here we can actually see how we're going to set up our application and they even give us a code snippet for how to do that so in the next steps we're going to be uh implementing this into nextjs real quick everyone if you haven't already please remember to like And subscribe it helps more than you know we also have an upcoming course on white labeling gpts so if you're interested in that please sign up below and with that let's get back to it all right so what we're going to do is we're actually going to go back to the spaces and look at the files um even though these files aren't uh totally usable there's a couple of interesting things one you can actually still get the HTML but the other thing is you can actually see the read me uh and it has the models in there so we're going to just clone this uh as a good good practice you can click clone repository and then just go ahead and copy this and that's how we're going to start our uh our uh IDE so I've already done this and we're going to jump into um uh VSS code in a second the other thing is you need to pay attention to this mpm package so this is a little bit different than how you would normally do the mpm install for Transformers you'll notice that has this version three or V3 so we're uh after we've cloned which is the project I have right here we're going to go ahead and do a couple of things with npm so the first thing I did was do an npm in it and that gives me this package.json and the reason for that is I I just want to track what packages I'm installing so now we'll go back and we'll get the npm uh Transformers V3 and we just go ahead and copy here we go back we'll just run our npm install cool and so after that ran uh I just want to point out again that this package version is a little bit different because we're actually doing this V3 three here so it's not like a semantic versioning necessarily all right so now what we're going to do is we're going to create our uh index uh module file just as like an example of how we would do this in JavaScript so we're going to go back and look at our code so if we come in here we know that we're going to have this uh Transformers so we want to copy this and just kind of look at it in our ID so we'll just kind of see our example so we want to import uh and then also you'll notice down here it has this import uh for generating a wave file just because we're only doing this in JavaScript we're um we're actually going to generate this wave file wave file so we have an actual result so if I go into my file what I've done is I've already imported the the statements so now that for both the audio and for the uh the auto tokenizer and the music generation for Content or for conditional Generations so now what we're going to do is we're going to add our token uh load in the tokenizer and the model itself so again we're looking at this specific model we found that in the readme but we also found it in uh the models on their uh hugging face page where we're defining what this model is we're pre-training it and then we're actually passing in some variables based on the encoding right here so if you look back at the example we have these um we have these different buttons for temperature uh and all the the different prompts and so these are the things that we're trying to account for when we're looking at our configurations like encoding and these are important for how we're actually generating so let's keep going and if we look at now what we're doing is we're building out our prompt so our prompt here would actually should be the text that's going in the input field and then we're taking those inputs into um and we're actually creating chunks essentially of our audio so this audio is going to be generated this is why we're doing a spread uh operator on our input and so that we can pass all the object or all the data in we're going to say our Max new tokens is 500 we're sampling and we have our guidance uh scale and this is actually enough to generate the audio however just so we can see a good result what we're going to do is we're going to add in the wave file so that we can actually generate this so the only other piece that we haven't done an npm on is this npm wave file so we'll go ahead and do that as well so if we do npm uh let me clear this so it's better npm i- swaave file and that'll save it to our package so now we should have everything that we need in order to run this and we'll see what is actually happening in our output so if we go ahead and we just do node oops node index and we'll just do our module file we'll start seeing the result probably pause part of the video because it'll take a second for us to actually generate the wave file all right so I ended up using my my actual terminal um but you can see this is the output that it's giving I'll try and make that a little bigger um and then based on all this information it's actually con catting and you can now see we have a music gen. wve file and this is coming from this file sync uh where we're taking the buffer from the wave scratch and actually uh creating a file from it so let's go ahead and just give it a quick test and now we have our 9 seconds of Music gen uh directly in the web so this is all in JavaScript even though we did it through our CLI you could actually connect this file into your HTML specifically in order to build out uh the same implementation that we had that they have over here so if you wanted to put in your input fields and then create buttons to change what that would be and then your scales and as as well as a button so again all of this can happen in the browser that's one of the coolest things about it um you know another easy way to do this is to to use it directly on hugging face um but now you have a way to also implement it into your applications all right that's it for us today again if you haven't liked and subscribed please remember to do so today what we covered was specifically looking at music generation and how do you actually use AI in the browser to do so we then did a quick implementation so that we can build on top of this with that happy nerding --- *Generated for LLM consumption from nerding.io video library*