Quickly deploying Hugging Face Models to AWS SageMaker.
A step-by-step guide to deploying HuggingFace models to AWS SageMaker and handling Inference endpoints.
In this article, we will explore how to deploy the state-of-the-art hugging face models to AWS SageMaker effortlessly. More specifically, we will follow a step-by-step guide to achieve it and throw lights and more insights on the sub-steps involved.
Let’s get started 🎉
Well, if you’re reading at this point, it means this article sounds good to you. If you want to deploy some HugginFace models in an environment that you can control and monitor. Then AWS SM is kind of an excellent choice for this use case.
SageMaker and HuggingFace 101
Before diving into technical details, let’s have a quick understanding of hugging faces and the amazing AWS Sagemaker service.
You can jump to the next section if you already have a good understanding.
HuggingFace is a state-of-the-art ML platform that hosts datasets, models, spaces, and daily trending ML papers. As of October 2024, the platform boasts more than 750,000 models, 300,000 datasets, and 150,000 spaces. It has become the ideal platform for AI engineers, practitioners, and enthusiasts, hosting the latest models from tech giants like Google, IBM, Intel, Microsoft, Apple, Meta, and NVIDIA. Read more about HuggingFace in my last article
AWS SageMaker is a fully managed service that simplifies the machine learning lifecycle. It handles everything from data preparation to model deployment. The service runs on a pay-as-you-go model with no upfront commitments. You only pay for the resources you actually use.
AWS SageMaker stands out for three main advantages: scalability, cost-effectiveness, and security.
- It offers unmatched scalability to handle heavy workloads without infrastructure concerns.
- It also provides comprehensive cost-effectiveness through its pay-as-you-go model.
- Furthermore, it delivers built-in security and compliance features for enterprise deployments.
For detailed information, including pricing, AWS provides a comprehensive pricing guide at
From HuggingFace to AWS SageMaker
Before we dive deep into the details, did you even know that you can directly deploy these models to HF spaces? So why do you even need a sagemaker?
You can jump to the next section if you already have your answer 😅.
AWS SageMaker offers distinct advantages that make it the perfect solution for many use cases. When compared to HuggingFace Spaces, SageMaker excels in three areas. It provides automatic scaling capabilities that adjust to your workload demands. The platform offers deep observability into your model’s performance and health. SageMaker also takes care of infrastructure management, letting you focus on your ML tasks.
Hands-on Deployment
For this practical demonstration, we will deploy the layoutLM-based model layoutlm-document-qa, a powerful multi-modal model designed for document question answering. This model excels at extracting information from documents by understanding both textual content and spatial layout. The model has been fine-tuned on SQuAD2.0 and DocVQA datasets.
First, ensure you have AWS credentials configured. You will need to setup these credentials. This official tutorial will be helpful in setting it up correctly.
At this point, we have to install the various dependencies we need.
pip install sagemaker boto3 transformers pillow torch
Next, we need to configure our AWS SageMaker environment. You can use the official guide to quickly set it up.
Next, we use the SageMaker SDK to define our HuggingFacemodel.
from sagemaker.huggingface.model import HuggingFaceModel
import sagemaker
role = sagemaker.get_execution_role()
huggingface_model = HuggingFaceModel(
transformers_version='4.26.0',
pytorch_version='1.13.1',
role=role,
model_id='impira/layoutlm-document-qa',
strategy='single-record'
)
For deployment, SageMaker offers various instance types to match your performance and cost requirements.
- The ml.m5.xlarge instance type works well for most general-purpose deployments.
- For GPU-accelerated inference, the ML.g4dn.xlarge instance provides NVIDIA T4 GPUs.
To deploy the model, we simply execute
predictor = huggingface_model.deploy(
initial_instance_count=1,
instance_type="ml.g4dn.xlarge"
)
That’s it! The model has been deployed to AWS SageMaker. Now let’s test the deployed endpoint.
import json
from PIL import Image
import base64
from io import BytesIO
# Load and prepare the image
image = Image.open("flight_ticket.png")
buffered = BytesIO()
image.save(buffered, format="PNG")
image_str = base64.b64encode(buffered.getvalue()).decode()
# Prepare the payload
payload = {
"inputs": {
"image": image_str,
"question": "What is the ticket's unique ID?"
}
}
# Send request to the endpoint
response = predictor.predict(json.dumps(payload))
print(f"🟢OUTPUT => {json.loads(response)}")
Very important! If you’re doing this for only learning purposes, don’t forget to cleanup resources after testing. To do so, you will have to run
predictor.delete_endpoint()
In this guide, we learned how to take a Hugging Face model and deploy it on AWS SageMaker. We saw how simple it is to set up a model, test it, and keep track of how well it’s running
Thanks for reading ❤️🔥🔥
My name is Baimam Boukar. I’m a software engineer, and I enjoy sharing my knowledge through blog posts. I write about Serverless and Machine Learning on AWS, Flutter and sometines on random topics like Space Technologies. Let’s stay connected!
- Find me on Github and HuggingFace
- Let’s connect on LinkedIn