Saulita Smith

July 8, 2024

Imagine a person with ADHD telling a story, constantly interrupted by their thoughts, struggling to concentrate and stay on track. This is a common scenario for the millions worldwide who live with attention deficit hyperactivity disorder (ADHD), a condition characterised by persistent patterns of inattention, hyperactivity, and impulsivity. According to recent studies, about 5.62% of the global population is affected by ADHD (Salari et al., 2023), impacting both children and adults.

How can AI and Machine Learning help individuals with ADHD overcome these challenges?

Below is an innovative architecture designed to leverage the power of AI to assist those with ADHD in communicating more effectively. The architecture components and workflow are exposed below.

User Interaction – ADHD User

The process begins with an ADHD user interacting with the system through a smartphone. This user-friendly interface is designed to be intuitive and accessible, catering to the specific needs of individuals with ADHD. The smartphone is the primary point of contact, allowing users to engage with the AI-powered system anytime and anywhere.

User Interaction – Speech-to-Text

As the user speaks, their speech is converted to text using a cutting-edge speech-to-text service. This real-time conversion is crucial for capturing the user’s thoughts as they occur, helping to mitigate the effects of ADHD-related distractions or memory lapses.

Authentication via Amazon Cognito

Security is paramount in any system dealing with personal information. The user is authenticated using Amazon Cognito, ensuring secure access to the system. This step protects the user’s privacy and maintains the integrity of the data being processed.

Text Processing via Input Prompt (User Interface)

The transcribed text is displayed as an input prompt on the user interface as the user speaks, removing the tedious manual need for typing. This visual representation of their speech helps users track their thoughts and provides immediate feedback, which can be particularly beneficial for individuals with ADHD who struggle to maintain focus.

AWS Cloud Services – Amazon Transcribe

Amazon Transcribe, a powerful ML/AI service, transcribes the spoken input into text. This service utilises advanced machine learning algorithms to provide accurate transcriptions, even in challenging acoustic environments or with diverse accents.

AWS Cloud Services – Amazon Transcribe – API Gateway

The transcribed text is sent to an API Gateway for further processing. This gateway acts as a “front door” for applications to access data, business logic, or functionality from your backend services, such as Lambda functions.

AWS Cloud Services – Lambda Function (Preprocessing)

The API Gateway triggers a Lambda function that performs preprocessing on the text. This serverless compute service runs code responding to events and automatically manages the underlying compute resources. The preprocessing step could involve removing filler words, correcting grammatical errors, or structuring the text for easier summarisation.

Machine Learning and Summarisation – FLAN5 Model for Fine-Tuned Summarisation

The preprocessed text is sent to a FLAN5 model hosted on a SageMaker NEO endpoint. This model is fine-tuned to summarise the input text, distilling the key points from the user’s speech. The FLAN5 model, known for its ability to understand context and generate coherent summaries, is particularly suited for this task.

Machine Learning and Summarisation – SageMaker NEO Endpoint

This endpoint serves the FLAN5 model for efficient inference. Amazon SageMaker Neo optimises machine learning models to run faster on specific hardware platforms, including mobile devices. This optimisation ensures that the summarisation process is quick and responsive, which is crucial for maintaining the attention of users with ADHD.

Post-Processing and Response – Lambda Function (Response)

Another Lambda function processes the summarised text to prepare it for delivery. This step might involve formatting the summary, adding any necessary context, or personalising the output based on user preferences.

Post-Processing and Response – Amazon Polly

The processed summary text is converted back to speech using Amazon Polly. This text-to-speech service uses advanced deep learning technologies to synthesise natural-sounding human speech. For users with ADHD who may prefer auditory information, this step is crucial in making the summarised content more accessible.

Post-Processing and Response – Summarisation (User Interface)

The final summarised speech is presented to the user through the user interface. This could be audio playback accompanied by a visual representation of the critical points. The combination of audio and visual elements caters to different learning styles and helps reinforce the information for users with ADHD.

Data Flow

The ADHD user provides input through speech.
The speech is converted to text and displayed on the user interface.
The text is authenticated and securely transmitted to the cloud services.
Amazon Transcribe handles the speech-to-text conversion.
The transcribed text is sent through the API Gateway to a Lambda function for preprocessing.
The preprocessed text is sent to the FLAN5 model for summarisation via the SageMaker NEO endpoint.
Another Lambda function processes the summarised text.
Amazon Polly converts the summary text back to speech.
The user interface delivers the summarised speech to the user.

This architecture leverages various AWS services to provide a seamless experience for users, transforming their spoken input into concise and meaningful summaries through advanced machine-learning models.

Empowering ADHD Users with AI: A Smart Architecture for Seamless Communication – Conclusion

This innovative AI-powered architecture represents a significant step forward in assisting individuals with ADHD. By leveraging cutting-edge ML/AI technologies like Amazon Transcribe and Amazon Polly, along with serverless computing via Lambda functions, this system can help ADHD users communicate more effectively and organise their thoughts.

The combination of speech-to-text and text-to-speech technologies, coupled with advanced summarisation models, provides a comprehensive solution that addresses many of the challenges faced by individuals with ADHD. This system helps capture and organise thoughts and presents them concisely and coherently.

As we advance in AI and machine learning, we can expect even more sophisticated solutions to emerge, further empowering individuals with ADHD and other cognitive differences. This architecture is a prime example of how technology can be harnessed to create more inclusive and accessible communication tools for all.

References

Salari, N., Ghasemi, H., Abdoli, N., Rahmani, A., Shiri, M. H., Hashemian, A. H., Akbari, H., & Mohammadi, M. (2023). The global prevalence of ADHD in children and adolescents: a systematic review and meta-analysis. Italian journal of pediatrics, 49(1), 48. https://doi.org/10.1186/s13052-023-01456-1

Saulita Smith

Tags:

amazon cognito amazon sagemaker aws cognito Lambda ML polly sagemaker transcribe