Posts

Showing posts with the label Microsoft

Exploring Hugging Face and Microsoft’s Vision Models

Image
  Hello everyone! As many of you might already know, Hugging Face is one of the most significant platforms available today for AI enthusiasts and professionals. It hosts a plethora of state-of-the-art models which can be leveraged to solve various AI use cases. In today's blog post, we will delve into two crucial topics that you should be familiar with if you are diving into the data science industry: How to call any open-source model from the Hugging Face Hub. Microsoft's new model, Phi-3 Vision, a multimodal model that processes both images and text. Let's break down these topics step by step. Understanding Phi-3 Vision Phi-3 Vision is a groundbreaking multimodal model by Microsoft. It boasts a whopping 4.2 billion parameters and has both language and vision capabilities. This means it can work with both images and text, making it incredibly versatile. It is especially optimized for understanding charts and diagrams, generating insights, and answering questions based on v...

Unlocking the Power of Microsoft's Multimodal Model 53 Vision for OCR

Image
  Welcome Fellow Learners Microsoft has recently released a state-of-the-art, open-source model called Multimodal Model 53 Vision. This model, part of Microsoft's 53 family, is capable of handling both vision data and text data due to its multimodal capabilities. With a context length of 128k, it offers significant latency and compute benefits. You can find this model on popular model repositories and it is designed for three primary use cases: general image understanding, OCR (Optical Character Recognition), and chart and table understanding. Our use case will be OCR, specifically extracting text from an invoice image. Let's get started! Steps to Implement the 53 Vision Model Step 1: Setting Up the Environment First, we need to install all the required libraries. Open your Visual Studio Code and create a new project folder. Inside this folder, create three files: requirements.txt , main.py , and .env . requirements.txt numpy Pillow requests torch torchvision transformers accel...