Real-Time Medical Transcription Analysis Using AI - Python Tutorial

Full Transcript — Download SRT & Markdown

00:00
Speaker A
In this video, we'll build a medical transcription analysis app, which will be able to take in medical spoken data in real time and also understand key information within it.
00:11
Speaker A
Let's take a look at exactly how our application will work.
00:14
Speaker A
I'm going to click the button to start recording, and I'm going to pretend to be a doctor talking about a recent patient visit that I've just had.
00:23
Speaker A
Today I met with patient Mr. Garcia, who is a 45-year-old man.
00:28
Speaker A
He has been dealing with a cough for about two weeks, and he mentioned some mild chest pain.
00:34
Speaker A
He did not have any fever, and his breathing was fine. However, I've prescribed a course of antibiotics to cover any possible bacterial infection.
00:43
Speaker A
I've also ordered a chest X-ray and some blood work.
00:47
Speaker A
We'll be meeting again next week to follow up to see how he's doing.
00:50
Speaker A
So, our application transcribes everything that I'm saying in real time, and it also identifies key medical information such as medical conditions, any medicines which are being prescribed, or any tests and procedures that are needed.
01:05
Speaker A
To build this
01:06
Speaker A
All you need to do is make use of Assembly AI's real-time transcription, as well as its large language model framework, LeMUR, which enables us to use different large language models, for example, Claude 3.5 Sonnet.
01:59
Speaker A
There's two things you need to do to start building this application. First off, you want to download the GitHub repo for this project, and I'll be leaving the link for this GitHub repository in the description box below.
02:10
Speaker A
And the second thing you want to do is sign up for a free Assembly AI API key. The link in the description box below will allow you to do that, and it also gives you $50 worth of free credits to get started.
02:21
Speaker A
There are three important libraries that we need to download for this project.
02:25
Speaker A
First of all, it's Assembly AI, so you want to run this command pip install "assemblyai[extras]" in your terminal.
02:38
Speaker A
And also, you want to install PortAudio as well.
02:40
Speaker A
So, let's actually go ahead and copy this.
02:42
Speaker A
And head on over to terminal.
02:44
Speaker A
I've went ahead and created a virtual Python environment where I will be downloading all of these libraries.
02:49
Speaker A
So, once I'm there, I'm just installing the libraries that are needed.
02:52
Speaker A
So, I've installed Assembly AI extras.
02:55
Speaker A
And also, now I'm going to install PortAudio.
02:59
Speaker A
Finally, we also want to install Flask.
03:03
Speaker A
Now that you've installed these three libraries, we can head on over to Visual Studio Code.
03:08
Speaker A
Open up that project folder that you've downloaded from GitHub, and we can start writing our code.
03:12
Speaker A
Once you're in Visual Studio Code, in the project folder that you've just downloaded, you should see three main components.
03:20
Speaker A
First of all, it's app.py, where we write the main logic of our code.
03:26
Speaker A
And then, next off is index.html, which has the HTML code for our UI and also the communications between our app.py file as well as the front end.
03:33
Speaker A
We also have styles.css, which just contains the CSS for our UI.
03:39
Speaker A
Now, let's go ahead and open app.py.
03:41
Speaker A
I've broken down our code into six different steps.
03:46
Speaker A
Steps one, two, and three are already written, and we will be writing steps four to six.
03:52
Speaker A
Before we start that, let me actually walk you through the first three steps.
03:55
Speaker A
So, in step one, what we're doing is importing all of the Python libraries that we require for this project.
04:01
Speaker A
And that namely includes Flask and Assembly AI.
04:05
Speaker A
Next, we also want to define our Assembly AI API key.
04:08
Speaker A
So, here's exactly where you should be doing that.
04:10
Speaker A
Next, we have a few global variables that we want to define at the very top, which is our transcriber object as well as our session ID.
04:16
Speaker A
In step number three, we have our prompt.
04:19
Speaker A
So, this prompt is really important, and what it does is it essentially tells the large language model that you are a medical transcript analyzer.
04:30
Speaker A
Your task is to detect and format words and phrases that fit into the following five categories.
04:34
Speaker A
And these five categories are essentially what we're looking out for in the transcript.
04:40
Speaker A
And what we have stated in our UI as well.
04:42
Speaker A
So, that includes protected health information, anatomy, medicines, and also tests and procedures.
04:50
Speaker A
So, for each of this category, when the large language model is identifying them, it should format the text that it returns to us.
05:00
Speaker A
When it receives a transcript and it processes them.
05:03
Speaker A
It also makes sure to format that text with HTML tags.
05:08
Speaker A
And that's essentially how we are able to display the highlights or different formattings on our application.
05:14
Speaker A
Now, let's move on to step number four, which is real-time transcription.
05:17
Speaker A
First off, we are going to start off by defining the real-time transcriber.
05:29
Speaker A
So, this is the code for defining our real-time transcriber object.
05:39
Speaker A
In this, we define our sample rate, as well as what actions we require it to do when it receives data.
05:48
Speaker A
When an error occurs, when we first open the real-time transcriber, and when we close it.
05:52
Speaker A
So, these are all methods which are right here, and we'll be filling them out right now.
05:57
Speaker A
In the on open method, we want to define the behavior of our real-time transcriber when we start a session.
06:02
Speaker A
Here, we want to define the session ID.
06:09
Speaker A
Next is the on data method.
06:11
Speaker A
In the on data method, we want to define what we want to do with the real-time transcript which is coming in.
06:16
Speaker A
So, let's actually start writing that out.
06:23
Speaker A
So, what we have just defined is that if we don't receive any text, or if we receive empty text, don't do anything.
06:28
Speaker A
Next, in the event that we receive a real-time final transcript, which actually refers to a fully uttered sentence, or whatever sentence that you have said before you took a break of at least 700 milliseconds.
06:40
Speaker A
So, all of that text, what we want to do is send that full final transcript right into this method called analyze transcript.
06:47
Speaker A
Now, in the else loop, what we're saying is, hey, if you didn't receive a final transcript, so you're getting a partial transcript.
06:55
Speaker A
And in this case, a partial transcript is word by word what you're saying.
07:00
Speaker A
So, individual words of what you're saying, what we want to do in that case is just print that out into our UI.
07:10
Speaker A
In these two methods, we're defining what we want to do in the case of an error, as well as when we close our transcriber.
07:16
Speaker A
Now, let's go back over to transcribe real time.
07:20
Speaker A
And complete this method.
07:30
Speaker A
So, here we're connecting the transcriber to our microphone, and we're streaming it.
07:36
Speaker A
Next, in step number five, we're going to be analyzing our transcript.
07:39
Speaker A
Every time we receive a real-time final transcript, we send it over to this method, which will then send our transcript over to the large language model to get it analyzed.
07:47
Speaker A
For the step, we'll be making use of Assembly AI's LeMUR framework.
07:53
Speaker A
And LeMUR allows you to use a bunch of different large language models, and in this case, we'll be using Claude 3.5 Sonnet.
08:00
Speaker A
In this case, first off, we're calling on LeMUR's task method.
08:06
Speaker A
And to do that, we need to pass in three variables, which is our prompt.
08:11
Speaker A
Which is the prompt that we've written here at the very beginning.
08:14
Speaker A
The second parameter that we're passing in is the input text, which in this case is our transcript for every single sentence that we're uttering.
08:21
Speaker A
We're going to be passing that in.
08:23
Speaker A
And lastly, we're defining that we want to make use of Claude 3.5 Sonnet in order to do this.
08:29
Speaker A
At the end of this, once we get our result back from LeMUR, we want to pass that to our front end.
08:34
Speaker A
So, again, we're going to be making use of SocketIO to pass that result.response, which is the text, back into our front end.
08:41
Speaker A
Now, we're in the final step where we'll put all of these pieces together to define the logic of our overall app.
08:55
Speaker A
So, first off, we are rendering our HTML.
09:00
Speaker A
After that, we're creating a method called handle toggle transcription, and essentially what this does is every time we click on the button to start transcribing in real time.
09:12
Speaker A
It first checks if a transcriber is already in place.
09:17
Speaker A
If so, it makes sure that it closes that and then restarts a new one, and also ensures that it is thread safe.
09:24
Speaker A
After saving everything that we've done.
09:27
Speaker A
I'm heading on back into terminal to run our code.
09:40
Speaker A
Once I open it on the local address, this is what our application looks like.
09:44
Speaker A
So, let's start recording.
09:48
Speaker A
Today I met with a patient Jane Smith, who is 32 years old.
09:53
Speaker A
She has been experiencing frequent headaches and some dizziness over the past few days.
09:58
Speaker A
I've prescribed some ibuprofen to deal with her pain, and I've told her to keep a headache diary so she keeps track of whenever she gets headaches.
10:07
Speaker A
We'll review her symptoms in two weeks to determine the next steps.
10:12
Speaker A
So, here's our application, which does medical transcription analysis in real time.
10:20
Speaker A
And it's also able to identify key medical information in real time, which is extremely helpful for healthcare professionals.
10:28
Speaker A
In the next part of our application, we'll take a look at how we can save this key information directly into Google Sheets, and how we can deploy this application to the cloud.
10:37
Speaker A
If you're interested in watching the next part of the medical transcription analysis project, check out the video

Get More with the Söz AI App

Transcribe recordings, audio files, and YouTube videos — with AI summaries, speaker detection, and unlimited transcriptions.

Or transcribe another YouTube video here →