From the course: Microsoft Azure AI Engineer Associate (AI-102) Cert Prep by Microsoft Press
Optimize visual content review processes - Azure Tutorial
From the course: Microsoft Azure AI Engineer Associate (AI-102) Cert Prep by Microsoft Press
Optimize visual content review processes
- [Instructor] So, again, we're taking up the issue of enterprise scale. Fabricam needs to screen their seller uploads to make sure that nothing embarrassing or inaccurate, or hostile gets through in front of customers. So what are we doing? We're thinking of the various Azure tools that allow us to do this fast, event-driven operation. So a blob upload, that is, a seller completing a file upload in their front end application could trigger a logic app or a function which sends a request to the Azure AI Vision API to check, for example, for inappropriate adult or racy content. From there, depending upon the results of the analysis, we could route the message and actually it would be the upload as well as its metadata to Cosmos DB for permanent archival. We could put it on a queue. We could do email. Especially logic apps, you've seen how eminently pluggable they are. This reference architecture, which I have the attribution for right here, it's from the Azure Architecture Center, go.techtrainertim.com/ica. Anyway, this architecture shows how to build a scalable automated image moderation pipeline using Azure AI vision. Let's look from left to right, shall we? A user uploads an image to blob storage that triggers a logic app or Azure function, which calls the Vision API. Does this look familiar? Yes, it's pretty much the same flow. I showed you the same code that I showed you a moment ago in Node, I think it was. This is just the representation using Azure icons. The AI-102 certification exam alert I have for you this time around is called PBT for performance-based testing. Let me get my drawing tools out here. Here's the way the exam is going to work. You'll have the main trunk line of AI-102, and you'll have one time limit. I think it's like two and a half hours. We can't talk about the specifics of the exam, so I have to speak in generalities, but maybe, approximately 40-ish or so questions. But here's the thing. I think I mentioned this earlier in the training. When you hit a performance-based lab, or if you hit a case study, you have to do those right then, and there and when you exit those environments, you can never return to them. Now, that's not bad or good, it just is what it is, all right? Now, the performance-based lab, as you can see, gives you a remote desktop protocol session into it looks like an Edge browser, and you sign in to the real Azure portal with credentials that you get over here on the right side of the screen. This is all taking place in your Pearson View testing software on your Mac or PC computer, all right? And lastly, I want you to take heart that the tasks here are not interdependent. If you feel stuck, go ahead and go to the next task. You can mark 'em complete. If you just can't mark it complete, don't. But I would definitely say try your hardest. I always say for multiple choice and interactive items, you never ever want to leave an answer completely blank. I would say do the same here in the portal environment as well. Last things, notice that it shows the cloud shell. That's intentional. If you're more comfortable using Shell scripting and PowerShell and the Azure CLI to do the work, you absolutely can do that. It's the state of the JSON ultimately that's captured by the exam testing engine, and it's correlated with your credentials, and that's what gives you your score. Now, you'll never know the score, what you did on the PBT, it's private. I'm just telling you this, so that you're well forearmed. All right, in the demo, we're going to get into computer vision here formally. We're in the Azure management portal under Azure AI services and don't want to start off on the wrong foot here, but I want to open up with a little bit of a can of worms. One of the things that I need to show you is how to use, here we go, the Vision Studio. We've seen that the various cognitive AI services have their own web portal to test out the service, and we're absolutely going to do that, but you'll find that depending upon what type of computer vision resource you surface, you may not be able to use it in the Vision Studio, okay? If I click view all resources, at least one of these, I can't use as a default resource. And if I hover over it, it says it doesn't support all scenarios, like Face, OCR, content moderation, only cognitive services resources are available. Now, what the heck does that mean? Well, I was able to do it, to use the Vision Studio after I created a cognitive services multi account. Now, I think those are eventually destined to go away. If you look down here in the Azure AI services, the idea here is that if you've got a group who are using more than one service, you can put it all under one endpoint. See my endpoint up here? And you can put it all behind two interchangeable keys connection strings. The benefit here is that you don't have to create separate services. Decision, language, speech, metrics, document, and vision are included. So there are two ways to get into Vision, and that's important because I've mentioned before, I'm sure I'll mention again, you don't want to pay extra. You don't want to have multiple instances of computer vision when you don't need them, is what I'm trying to say. All right, well, in the Vision Studio, which you can get to the DNS name, actually, is the easiest way, portal.vision.cognitive.azure.com. And the interface here is try it out, which is super good for education. That's pretty much all this is, is a try-it-out portal. But again, notice that you have to have a resource that works because you're actually sending real requests. Let's step through these requests. First, we have OCR, Optical Character Recognition. The use case here is that we need to extract text from image. Maybe we've got traveling salespeople who are doing research in other countries. They've taken all of these pictures of billboards and signs. Again, with Azure and Azure AI, we have scale on our side. We aren't going to be capped out at 50 images, you know? We can throw 50,000 images just as easily as we can throw 50 at the AI. I also want to draw your attention as a developer for sure to these golden links up here with the rest API Specs. You definitely want to read those docs. We have to acknowledge that we are burning real transmissions, and let's look at some of the sample stuff here. 'Cause it's really there, they've been chosen for good reason by Microsoft. For example, I know I've been around since the beginning of OCR when you had a bad scan and the paper was bent. I mean, look at all the issues going on here with this social security card scan. This is obviously a bit contrived, but it's really being pulled back. This JSON is really coming back from my cognitive service. So, I mean it's contrived in some way. The underlying model has seen this image so many times. It's always going to give you a dead ringer. Perfect example. You know, you get that, but the benefit is we don't have to do the day-to-day machine learning lifecycle stuff like we would if we had Azure Machine Learning Service going. We can just tap into it. Now, you will learn in time that the Custom Vision service allows you to train on your own unique stuff. Maybe you're a widget manufacturer and maybe you're not yet a Coca-Cola where you're known. I mean, if you're a celebrity, or if it's a known top-line product, the AI probably knows about it already. All right, what do I want to say about the JSON? Well, it's much more likely that you're going to see, not the pretty printed result, but you're going to see JSON on your exam. So you have to know, be really comfortable with the schema. It's pretty straightforward. Basically, object detection deals with bounding boxes, and you've got various bounding polygons for each line, each character. I mean, and this is a case where the ability to shred JSON, having that as a very strong skill in your skillset. Another thing that's critical for AI-102 and beyond, and I've mentioned this, is the rating or the confidence score. We've seen with content safety, that being the case, confidence is normally zero to one. 0.0 is no confidence. 10.0 is full confidence. In this case, we're seeing, wow, that looks like a pretty low number, but you might find that the confidence might seem low for a given number or a word, but then confidence for the whole line or the whole document is much higher. It totally depends. What are some other examples that normally have presented challenges to OCRs in the past? Well, handwriting, for sure. The notion of working in a medical office and normally physicians, at least, where I live in the USA, historically, there's a stereotype of them having awful handwriting. Here, again, we're detecting attributes by bounding box. Another neat thing is that you can correlate this front end that Microsoft built is pretty cool that allows you to interact with them. Tape, that's another novel use case and notice that you can browse. You can even take a picture with your webcam if you want to. All right, so let's go to spatial analysis. This one is like a 3D view with video as you can see. Look through a video for a particular 3D object. Looks like we've got media, industrial, and retail. Let's check out that retail example. Find a specific moment in the video based on a natural language search. Show me a person with a pink jacket as we can see. This could be potentially super duper useful. I'm thinking of exam questions that say you've got security footage, and you want to make sure it's enriched, and you can quickly differentiate bank customers from customers, you know, these sorts of things. Now, some of these abilities, they get pretty sensitive 'cause they get with personally identifiable information. This is why I yammer on a lot about the Microsoft-responsible AI pillars. It's not just something we need to know for the exam and for the cert. We need to abide by them as customers of these services, okay? All right, so let's go back to Vision. Face is fascinating because you could really stand up a multifactor authentication system by consuming these APIs, detect faces in an image, liveness detection. What is your confidence that this is in fact a live human being? That's kind of science fiction-ish in a way, isn't it? I would say the 80% scenario operations are these, the image analysis ones, particularly things like, you know, they're pretty mundane, honestly. Give me some marketing captions, dense or light for images. Do some object detection, extract some tags. A lot of these seem to be skewed more toward marketing, but nonetheless, this is computer vision. So let's see if we've got, well, this is interesting because the train is so blurred. Looks like subway train was picked up with a 79.5% what? Confidence. Remember that? Now, see this slider? This is interesting 'cause it evidently shows us the ability to customize the API using properties. You know this if you're an experienced developer. If you're not, one of the cool things about being a dev is that with these APIs, as long as the control is surfaced to us, we can twiddle it, you know? In this case, what is it doing? Only show suggested objects if the probability is above this percentage. Oh, so, only if you're this certain. Okay, interesting. Then, again, if we look in the JSON, we get in the results. The specific scheme is going to depend on the API definition, but on the operation, I should say. We've got dimensional, bounding box, values, tag values, confidence values, as said before. It all comes to largely the same kind of stuff. I hope that that gives you a great deal of confidence, and it makes you excited to use these services in your own career. All right. To finish out, I want to show you my Woodgrove bank image moderation app. This is in the gotechtrainertimcom/ai102 course files as everything else is, image moderation app js. There's a lot of heavy lifting in here. I build the web server in here. Don't look at this as best programming practices as much as it nicely and educationally gets the job done. Basically, the app simulates an internal content review portal at Woodgrove. It's built with node and express, and when the user submits a file right down here on line 131 and 132, we can see the URLs being built. When the user submits an image URL, the server hits both Azure AI content safety and vision in parallel. You see educational, I wanted to show both. Content safety returns, harm category, severities, vision flags for adult and racy content, and then the app also assigns a correlation ID. It's got a routine for killing ports before it starts my cleanup port. There is some pretty cool code in here, I must say, but ultimately at the end of the day, I know I keep repeating this, see these headers? You're just building HTTP requests, back and forth, back and forth, back and forth. Let me make sure my server is still running down here. Yep. It says my image moderation portal is running on local host 3002. First though, when you look at the code, you'll understand a bit more. This is Azure Storage Explorer. It's a free, closed-source, cross-platform editor that Microsoft makes, and I'm signed in with my tech trainer Tim account, and I'm browsing my storage accounts and what I've done is I've created a blob container called demos, and I specifically set the access policy to anonymous read, 'cause the assignment here, Woodgrove Bank. These images temporarily need that level of privilege, and I guess the reason why I want to take a look, I'm going to right click and grab one of their URLs. We're going to need it for the running app. I just wanted to show you where these are, that this stuff actually works that you know how a moment ago when we looked at the logic app function app workflow, soon as I finished, these uploads down here, as soon as they finished landing in this blob container, those events could be tripping off my function app like popcorn, and they're working in great parallel execution. Isn't that cool? Here's the UI for the Woodgrove Bank portal, but I just remembered there's another helper node file in the course files that grabs all of my URIs from blob storage. I just made it as a helper function, so I could actually grab them from here if I wanted to as well. Anyway, another thing that you might want to think about, let me assign it to you for homework, is adjusting this front end to accept not only image URLs, but also file uploads, you know? Oh, well, nuts. I've got some debugging to happen, but the idea is that URL is forwarded in a request to both of those services and ideally, what we want to see is a report page that shows us with lots of emoji what the results were. As you can see in my console output down here, I'm having some issues divining the right format, the right properties. This is some of the unfortunate stuff of a fast-moving technology like Azure AI. So this is real, live debugging output you're seeing down here. Well, that's a problem. That's my responsibility to fix now, and I'll make sure when we come back to the service in the next lesson, this is working, so I can show you. I'll see you then.