Nueva publicación

Encontrar

Artículo
· 4 jul, 2023 Lectura de 3 min

Using AI to Simplify Clinical Documents Storage, Retrieval and Search

Problem

In a fast-paced clinical environment, where quick decision-making is crucial, the lack of streamlined document storage and access systems poses several obstacles. While storage solutions for documents exist (e.g, FHIR), accessing and effectively searching for specific patient data within those documents meaningfully can be a significant challenge.

Motivation

AI has made document search remarkably powerful. Question and answering over docs has never been easier with open-source tools like Chroma and Langchain to store and use vector embeddings to query across Generative AI APIs. With more dedicated effort, organizations are indexing their existing documents and building fine-tuned versions of GPT for enterprise purposes. Andrej Karpathy’s talk on State of GPT provides an excellent overview on this topic.

This project was our attempt at trying to reduce friction across all touch points where a clinician might interact with documents. From input and processing to storage and retrieval, we’ve leveraged IRIS FHIR and AI to make help store and find the information they need effortlessly.

Solution

We’ve built a full-stack web-app that allows clinicians to record voice notes. These notes can then be transcribed and summarized using Open AI and stored into FHIR servers. The stored documents are then indexed and made available for semantic search.  

Demo Video

Key Features

  1. Web app - To view clinical information about patients, observations and encounters. This is built using Vue.js.
  2. Voice transcription - Open AI Whisper API is used to transcribe voice recordings accurately to text.
  3. Text summarization - The transcribed content can then be summarized and given a title in the required format. Like specific sections like symptoms, diagnosis, and so on. This is achieved using Open AI text completion API using the text-da-vinci-003 model.
  4. Document storage - The summarized documents are then stored in FHIR using the Document Reference artifact.
  5. Semantic document search - The stored documents are indexed and stored in Chroma as chunks. This is then used to limit the search space and use GPT tokens sparingly for semantic search using Langchain. Currently, we load the documents at the time of search due to the smaller number of documents available. This can be modified to index in the background in an async manner.
  6. Documents export - Finally, there is an option to export documents to Google Docs and other data to Google Sheets. Users can log in with their specific accounts using OAuth and export the documents for easier collaboration and communication with other clinicians or patients.

Try it out

Clone the project repository from the following GitHub link: https://github.com/ikram-shah/iris-fhir-transcribe-summarize-export. Follow the provided instructions to set up the project locally on your machine. Let us know if something doesn’t work as expected.

Thoughts and Feedback

Advanced language models available today combined with the massive volume of data available hold immense potential to revolutionize healthcare, especially in the documents space. Let us know your thoughts and any feedback below. We will follow up with more posts on the technical details behind this project.

Vote for our app in the Grand Prix contest if you find it promising!

Comentarios (0)0
Inicie sesión o regístrese para continuar
Artículo
· 4 jul, 2023 Lectura de 10 min

A portal to manage storage made with Django - Part 1

Our objective

In the last article, we talked about a few starters for Django. We learned how to begin the project, ensure we have all the requisites, and make a CRUD. However, today we are going a little further. 
Sometimes we need to access more complex methods, so today, we will connect IRIS to a Python environment, build a few functions and display them on a webpage. It will be similar to the last discussion, but further enough for you to make something new, even though not enough to feel lost.
In this project, we will get information about the globals in IRIS to track their sizes to understand the cost of each namespace and table in MBs.

 

The Project

Every step here is available on my GitHub Repository, in the commits history.

Starters

We are going through a few steps like those in the last article, so this part should be familiar.

  • Go to the desired directory on the terminal and type the following.

django-admin startproject globalSize

  • Add the requirements.txt with the following text, and type

pip install -r requirements.txt

to ensure you have the requirements.

# Django itself
django>=4.2.1
# InterSystems IRIS driver for Django
django-iris==0.2.2
  • In globalSize/settings.py, add IRIS on the DATABASES configurations:
DATABASES = {
	‘default’: {
		‘ENGINE’: ‘django_iris’,
		‘NAME’: ‘USER’,
		‘USER’: ‘_system’,
		‘PASSWORD’: ‘SYS’,
		‘HOST’: ‘localhost’,
		‘PORT’: 1972,
	}
}
  • Don’t forget to add a .gitignore and a .editorconfig. It is also convenient to have a linter of your preference, but it is beyond the scope of this article to discuss it.

 

Creating the app and model

We have created an app and a model in the last article, so this section should also be familiar, even though it is a different app and model.

  • To create the app, type

python manage.py startapp globals

  • In globals/models.py, create a model with the information you want to display about your globals:
class irisGlobal(models.Model):
	database = models.CharField(max_length=40)
	name = models.CharField(max_length=40)
	allocatedsize = models.FloatField()
	size = models.FloatField()


	def __str__(self):
		return self.name
  • In settings.py, add the new app to the INSTALLED_APPS:
INSTALLED_APPS = [
	…,
	‘globals’,
]

 

Setting URLs and the home page

Again we are going through a few more steps very similar to the last article.

  • In globalSize/urls.py, import the function include from django.urls and add a new path to globals.urls in urlpatterns.
from django.urls import path, include
urlpatterns = [
    …,
    path(‘globals/’, include(‘globals.urls’)),
]
  • Create the URLs for the app, adding the file globals/urls.py with the following text.
from django.urls import path
from .views import home
urlpatterns = [
	path(‘’, home),
]
  • Create the view we imported in the last step. On view.py add the function below.
def home(request):
	return render(request, “index.html”)
  • Finally, add the file globals/templates/index.html and generate the front page as desired. Check the example below:
<!DOCTYPE html>
<html>
  <body>
    hello world!
  </body>
</html>

If you enter the commands below and follow the link http://127.0.0.1:8000/globals/ you will already have a page displaying “hello world!”.

python manage.py makemigrations
python manage.py migrate
python manage.py runserver


Displaying the globals in the admin and home pages

  • In admin.py, import the model and register it.
from .models import irisGlobal
admin.site.register(irisGlobal)
  • Import the model in views.py and return it in the function.
from .models import irisGlobal
def home(request):
	globals = irisGlobal.objects.all()
	return render(request, “index.html”, {“globals”: globals})
  • Now we can access the globals from the index.html as preferred. See the example below.
<h3>ID  -  DATABASE          /       GLOBAL       - Size / Allocated</h3>
<ul>
  {% for global in globals %}
  <li>
    {{ global.id }} - {{ global.database }}    {{ global.name  }}  -  {{ global.size  }}  /  {{ global.allocatedsize }}
  </li>
  {% endfor %}
</ul>

 

Retrieving data

Retrieving data
At this point, we have the project ready to be loaded with information. There is a good amount of ways this can be shaped, but I will use Python’s approach so we can learn a new solution that is possible to integrate with Django and IRIS.
We need a few methods to retrieve all the data. We can use InterSystems IRIS Cloud SQL with the DB-API driver to connect to the instance we want to analyze - it doesn’t have to be the same as where we connected Django.
Organizing it in a new folder that we can treat as a module is a good practice. To assure that, create the folder api in globals, add an __init__.py empty file so that Python recognizes it as a module, and start writing the file to contain the methods. We can call it methods.py.


 

Create the connection

To connect our Python environment to the InterSystems IRIS, we should follow a few steps described in the section “ObjectScript in Python Environment” of the previous article Python and IRIS in practice.
From now on is simple; we import iris, pass the address of the connection (the IRIS instance we want to analyze in the following format: host:port/namespace), a username, and a password to the iris.connect method and create Python’s IRIS. Have a look at the code below.

import intersystems_iris as iris
from django.db import connection as djangoconnection

# connection by iris
conn_params = djangoconnection.get_connection_params()
conn_params[“namespace”] = “%SYS”
connection = iris.connect(**conn_params)
irisPy = iris.createIRIS(connection)

 

Getting database directories

Since we want to retrieve the globals' sizes, we need (of course) their sizes, their names, and their addresses - as known as databases.
I will show a simplified version of the function but remember that verifying every step and connection, and throwing an Exception if something goes wrong is a good practice.
Just like we would do in ObjectScript, we need a SQL statement so we can prepare it, execute it, and retrieve a list containing all the database directories in its resultset. We can do all that easily with the functions “irisPy.classMethodSomething()”, where Something stands for the type the method should return, and irisObject.invoke(), where we can access anything from the irisObject referred. Take a look at the following example.

def getAllDatabaseDirectories():
    try:
	   # check the connection made in irisPy, and if it is set to %SYS namespace
	   databaseDirectoriesList = []
	   with connection.cursor() as cursor:
		cursor.execute(“SELECT DISTINCT %EXACT(Directory) FROM Config.Databases WHERE SectionHeader = ?”, [“Databases”,],)
		databaseDirectoriesList = [row[0] for row in cursor]

    except Exception as error:
        return str(error)

    return databaseDirectoriesList

The statement variable is set to an object generated by the method %New of the IRIS %SQL.Statement class. Then it is possible to invoke the method %Prepare from the object instantiated, with a query string as an argument.  Next, we can invoke the %Execute and %Next methods to perform the query and loop through its result set, appending the desired information to a Python list for easy access.
It is easy to find every database directory in the Config.Databases table, located only in the %SYS namespace of every IRIS instance. Check it out in the Management Portal if you want, there is some more interesting information there.

 

Getting all globals from a database

This function is very similar to the previous one. However, we have a class query ready to use now. Once again we need a SQL statement, so we can prepare the DirectoryList query from the %SYS.GlobalQuery class. Next, we execute it with a database directory as an argument and retrieve a list containing all globals from that database.

def getGlobalsList(databaseDirectory: str):
    try:
        statement = irisPy.classMethodObject("%SQL.Statement", "%New")
        status = statement.invoke("%PrepareClassQuery", "%SYS.GlobalQuery","DirectoryList")

        result = statement.invoke("%Execute", databaseDirectory)

        globalList = []
        while (result.invoke("%Next")!=0):
            globalList.append(result.invoke("%Get", "Name"))

    except Exception as error:
        return str(error)

    return globalList


Getting globals sizes and allocated sizes

Finally, we can access the target information. Fortunately, IRIS has a built-in method to retrieve the size and allocated size if you provide a database and global pair.

def getGlobalSize(databaseDirectory: str, globalName: str):
    try:
        globalUsed = iris.IRISReference(0)
        globalAllocated = iris.IRISReference(0)
        status = irisPy.classMethodObject("%GlobalEdit", "GetGlobalSize", databaseDirectory, globalName, globalAllocated, globalUsed, 0)

    except Exception as error:
        return str(error)

    return (globalUsed.getValue(), globalAllocated.getValue())

This time, we need the IRISReference(0) function from the iris module to receive the sizes from the “GetGlobalSize” function by reference. Then, we can access the value with the method getValue().

 

Showing everything on the front page

Finally, we can use these functions to display the data on the front page. We already have a way through the information and a table, so we only need to populate it. I want to create an update button to do that.
First, we add a link to the index.html.

<body>
  <a href = "{% url 'update' %}">update</a></body>

Add the link to the urlpatterns list, in urls.py.

Add the link to the urlpatterns list, in urls.py.
from .views import home, update
urlpatterns = [
    path('', home),
    path('update', update, name="update"),
]

Then, create the view, in views.py.

from django.shortcuts import render, redirect
from .api.methods import *
def update(request):
    irisGlobal.objects.all().delete()
    databaseList = getAllDatabaseDirectories()

    for database in databaseList:
        globalList = getGlobalsList(database)

        for glob in globalList:
            used, allocated = getGlobalSize(database, glob)
            irisGlobal.objects.create(database=database, name=glob, size=used, allocatedsize=allocated)

    return redirect(home)

 

For this view, we must first import the redirect function from django.shortcuts, and the methods we just built.
It is a good idea to delete any previous data on the table so that eventually deleted globals will vanish. Since the global count is probably not gigantic, it is better to do it this way than to check each record to see whether it was deleted or needs an update.
Then, we get all the database directories so we can, for each database, check all the globals in it, and for each global, we can have their used and allocated size.
At this point, we have the Django model populated and ready to retrieve data, so we redirect to the home view.
If you access http://127.0.0.1:8000/globals/ and click the update link we added, the page should reload and in a few seconds it will display the list of globals, with its databases, sizes, and allocated sizes, like the image below.


 

Adding some aggregation

    You would be surprised to know how simple it is to add a few fast analysis options, such as a sum or count. It is not necessary to master Django to create a few dashboards on this page and after this section, you should be in a good place to start. 
    We already know that the view home is responsible to render the index. Up until now, we have generated the variable “globals”, containing all the data, and passed it to the index.html. We will do something similar but with aggregation functions. We will create a variable for each sum, use the aggregate() and Sum() methods, and add them to the context list argument of the render function. And of course, don’t forget to import Sum from django.db.models. Check the function below.

def home(request):
	globals = irisGlobal.objects.all()
	sumSize = globals.aggregate(Sum("size"))
	sumAllocated = globals.aggregate(Sum("allocatedsize"))
	return render(request, "index.html", {"globals": globals, "sumSize": sumSize, "sumAllocated":sumAllocated})


Now we can add it to the index.html file and add some paragraphs below the list (<ul> element). Inside those paragraphs, we can access the count of all globals, and the sums, as shown below.

<p>showing results for {{globals.count}} globals</p>
  <p>total size: {{sumSize.size__sum}}</p>
  <p>total allocated size: {{sumAllocated.allocatedsize__sum}}</p>
 </body>
</html>

Reload the link and you should have the following.


 

The end... almost

    In this article, we have learned about InterSystems IRIS storage of data, how to access it from Python, building an API, and using IRIS as a Cloud system, so we can keep track, and analyze it easily. We can see on the horizon some more complex queries, creating dashboards, automating the updates, and adding a notification system.
    In the next article, I will take a step closer to this horizon, showing how to filter and order the data before displaying it, adding some client-side editable options, and to top it off, we can add a pinch of CSS to make it charming.
    Would you like to see something I haven’t said yet? Please, contact me if you have any ideas or needs that you would like me to write about.
 

Comentarios (0)1
Inicie sesión o regístrese para continuar
Artículo
· 4 jul, 2023 Lectura de 6 min

Step by step guide to create personalized AI with ChatGPT by using LangChain

As an AI language model, ChatGPT is capable of performing a variety of tasks like language translation, writing songs, answering research questions, and even generating computer code. With its impressive abilities, ChatGPT has quickly become a popular tool for various applications, from chatbots to content creation.
But despite its advanced capabilities, ChatGPT is not able to access your personal data. So in this article, I will demonstrate below steps to build custom ChatGPT AI by using LangChain Framework:

  • Step 1: Load the document 

  • Step 2: Splitting the document into chunks

  • Step 3: Use Embedding against Chunks Data and convert to vectors

  • Step 4: Save data to the Vector database

  • Step 5: Take data (question) from the user and get the embedding

  • Step 6: Connect to VectorDB and do a semantic search

  • Step 7: Retrieve relevant responses based on user queries and send them to LLM(ChatGPT)

  • Step 8: Get an answer from LLM and send it back to the user

  NOTE: Please read my previous article LangChain – Unleashing the full potential of LLMs to get more details about LangChain and about how to get OpenAI API Key

 

So, let's begin
     

Step1: Load the document 

First of all, we need to load the document. So we will import PyPDFLoader for PDF document 

ClassMethod SavePDF(filePath) [ Language = python ]
{
#for PDF file we need to import PyPDFLoader from langchain framework
from langchain.document_loaders import PyPDFLoader
# for CSV file we need to import csv_loader
# for Doc we need to import UnstructuredWordDocumentLoader
# for Text document we need to import TextLoader
#import os to set environment variable
import os
#Assign OpenAI API Key to environment variable 
os.environ['OPENAI_API_KEY'] = "apiKey"
#Init loader
loader = PyPDFLoader(filePath)   
#Load document 
documents = loader.load()
return documents
}

Step 2: Splitting the document into chunks

Language Models are often limited by the amount of text that you can pass to them. Therefore, it is necessary to split them up into smaller chunks. LangChain provides several utilities for doing so.

Using a Text Splitter can also help improve the results from vector store searches, as eg. smaller chunks may sometimes be more likely to match a query. Testing different chunk sizes (and chunk overlap) is a worthwhile exercise to tailor the results to your use case.

ClassMethod splitText(documents) [ Language = python ]
{
#In order to split the document we need to import RecursiveCharacterTextSplitter from Langchain framework  
from langchain.text_splitter import RecursiveCharacterTextSplitter
#Init text splitter, define chunk size 1000 and overlap = 0
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
#Split document into chunks
texts = text_splitter.split_documents(documents)
return texts
}

Step 3: Use Embedding against Chunks Data and convert to vectors

Text embeddings are the heart and soul of Large Language Operations. Technically, we can work with language models with natural language but storing and retrieving natural language is highly inefficient. 

To make it more efficient, we need to transform text data into vector forms. There are dedicated ML models for creating embeddings from texts. The texts are converted into multidimensional vectors. Once embedded, we can group, sort, search, and more over these data. We can calculate the distance between two sentences to know how closely they are related. And the best part of it is these operations are not just limited to keywords like the traditional database searches but rather capture the semantic closeness of two sentences. This makes it a lot more powerful, thanks to Machine Learning.
 

Text embedding models take text input and return a list of floats (embeddings), which are the numerical representation of the input text. Embeddings help extract information from a text. This information can then be later used, e.g., for calculating similarities between texts (e.g., movie summaries).

Text embedding models take a text as an input and output its numerical representation as a list of floats

    ClassMethod getEmbeddings(query) [ Language = python ]
    {
    #Get embeddings model from Langchain framework
    from langchain.embeddings import OpenAIEmbeddings
    #Define embedding
    embedding = OpenAIEmbeddings()
    return embedding
    }
    

Step 4: Save data to the Vector database

    ClassMethod saveDB(texts,embedding) [ Language = python ]
    {
    #Get Chroma db  from langchain
    from langchain.vectorstores import Chroma      
    # Embed and store the texts
    # Supplying a persist_directory will store the embeddings on disk
    # e.g we are saving data in myData folder in current application path
    persist_directory = "myData"
    vectordb = Chroma.from_documents(documents=texts, embedding=embedding, persist_directory=persist_directory)
    #save document locally
    vectordb.persist()
    vectordb = None
    }
    

Step 5: Take data (question) from the user and get the embedding

    ClassMethod getVectorData(query) [ Language = python ]
    {
    #NOTE : We should have same embedding used when we saved data
    from langchain.embeddings import OpenAIEmbeddings
    #get embeddings
    embedding = OpenAIEmbeddings()
    #take user input (parameter)
    query = query
    #Code continue...

Step 6: Connect to VectorDB and do a semantic search

 #code continue....     
 from langchain.vectorstores import Chroma
 persist_directory = "myData"
 ## Now we can load the persisted database from disk, and use it as normal. 
 vectordb = Chroma(persist_directory=persist_directory, embedding_function=embedding)
 return vectordb
 }

Step 7: Retrieve relevant responses based on user queries and send them to LLM(ChatGPT)

Conversational memory is how a chatbot can respond to multiple queries in a chat-like manner. It enables a coherent conversation, and without it, every query would be treated as an entirely independent input without considering past interactions.

The LLM with and without conversational memory. The blue boxes are user prompts and in grey are the LLMs responses. Without conversational memory (right), the LLM cannot respond using knowledge of previous interactions.

The memory allows a Large Language Model (LLM) to remember previous interactions with the user. By default, LLMs are stateless — meaning each incoming query is processed independently of other interactions. The only thing that exists for a stateless agent is the current input, nothing else.


The ConversationalRetrievalChain is a conversational AI model that is designed to retrieve relevant responses based on user queries. It is a part of the Langchain team's technology. The model uses a retrieval-based approach, where it searches through a database of pre-existing responses to find the most appropriate answer for a given query. The model is trained on a large dataset of conversations to learn patterns and context in order to provide accurate and helpful responses.

ClassMethod retriveResponse(vectordb) [ Language = python ]
{
from langchain.llms import OpenAI
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationalRetrievalChain
#Conversational memory is how a chatbot can respond to multiple queries
memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)
#The ConversationalRetrievalChain is a conversational AI model that is designed to retrieve relevant responses based on user queries
qa = ConversationalRetrievalChain.from_llm(OpenAI(temperature=0), vectordb.as_retriever(), memory=memory)
return qa
}


Step 8: Get an answer from LLM and send it back to the user 

ClassMethod getAnswer(qa) [ Language = python ]
{
#Get an answer from LLM and send it back to the user
getAnswer = qa.run(query)
return getAnswer
}

To check more details and features, please visit my application irisChatGPT 

Related Video

Thanks

Comentarios (0)1
Inicie sesión o regístrese para continuar
Artículo
· 4 jul, 2023 Lectura de 2 min

Build iris image with cpf merge

When it comes to build an iris image, we can use the cpf merge files.

Here is an cpf merge example:

[Actions]
CreateDatabase:Name=IRISAPP_DATA,Directory=/usr/irissys/mgr/IRISAPP_DATA

CreateDatabase:Name=IRISAPP_CODE,Directory=/usr/irissys/mgr/IRISAPP_CODE

CreateNamespace:Name=IRISAPP,Globals=IRISAPP_DATA,Routines=IRISAPP_CODE,Interop=1

ModifyService:Name=%Service_CallIn,Enabled=1,AutheEnabled=48

CreateApplication:Name=/frn,NameSpace=IRISAPP,DispatchClass=Formation.REST.Dispatch,AutheEnabled=48

ModifyUser:Name=SuperUser,PasswordHash=a31d24aecc0bfe560a7e45bd913ad27c667dc25a75cbfd358c451bb595b6bd52bd25c82cafaa23ca1dd30b3b4947d12d3bb0ffb2a717df29912b743a281f97c1,0a4c463a2fa1e7542b61aa48800091ab688eb0a14bebf536638f411f5454c9343b9aa6402b4694f0a89b624407a5f43f0a38fc35216bb18aab7dc41ef9f056b1,10000,SHA512
1 Comentario
Comentarios (1)1
Inicie sesión o regístrese para continuar
Comentarios (11)2
Inicie sesión o regístrese para continuar