DP-300 Microsoft Azure Database – Evaluate performance improvements Part 3

  • By
  • July 11, 2023
0 Comment

4. 64. recommend query construct modifications based on resource usage

In a previous video, we ran this query. So we created the sales order detail copy and sales order header copy tables. But the important thing about these is that they’ve got no indexes whatsoever. So I created this query and run it and you can see it uses table scans because there’s an old wire clause. So you’ll be using a scan and it uses a hash match to create the results. Okay, so far so good. Now the computer is not at the moment complaining about the lack of indexes.

And there’s a reason for this. If you have a look at how many rows we’ve got, we’ve only got 32 rows, so really it doesn’t make that much difference. So when would it use the indexes? Well, it uses it when getting the join here between Sales order ID. Now, what if this wasn’t 32 rows? What if this was a lot more rows? So let’s double the number of rows. So I’m inserting into this table the table so that doubles the number of rows. So now there are 64 rows.

So if I execute this, we’re still fine, but let’s execute this a few more times. So 128, 256, 512. So now up to 1024 rows. So let’s execute this. Look at the execution plan, we’re still fine. Now let’s do it a few more times. So 20, 48, 40, 96, 81, 92, taking a bit longer and 16,384. So now let’s execute our original query. And if we now have a look at the execution plan, you can see that there is a missing index and a suggestion of what we create. Create nonclusive index had a name on and it gives what we should be having.

So the sales order ID and the customer ID as an include. So the include columns would be in a separate part. So in other words, you would have the main index. And then when you get to individual rows, there would then be a link to this separate part. So it’s a lot more efficient to be able to include include columns rather than just having just one column and then having to get the information elsewhere. So if I click on the dot, you can see this is the query text and if I write and click we can see missing index details and there is my code that I would need.

So let’s run that code and have a look at this query again. And now we can see it uses an index scan no clustered of this sales order header copy. Right, LAN, let’s just drop that index. So I’ll get rid of the words create and no clustered temporarily. So just drop this index. I didn’t actually give it a proper name, I’m just using the default name. So now it’s dropped on undo that. So it is still dropped. I just did the typing. So run this again and we’ll have that problem again.

So this is one way of detecting missing indexes but is there a better way to do it for an entire database? And the answer is we can use a DMV and this DMV is sys DM underscore DB underscore missing index details plural. So if we have a look at this because we are missing get this current index, it comes up so you can see it is in database ID five object ID 133-010-3779 and you can see that we’ve got an equality column of sales order ID no inequality columns and then some included columns. What are equality and inequality? Well, if we have a look at this index, it doesn’t actually give us any clues whatsoever.

So go back to the query and you can see that the query has got this equals. So that is the equality. So if somewhere like in the where clause we would have not equals to or greater than or anything apart from equality, then that would be in the inequality. When you are creating an index, you put equality first and then the inequality columns, both of these should be in the key and then you’ve got the included columns in the include section of the index. Now you can go through like this and work out what you need to do and why you need to do it and so forth.

So you can see it is this particular table, it says equality, inequality and not include columns. There are available on the internet and in the PDF that’s attached as a resource much earlier on in this course. Examples of the full thing that you can do with these sorts of things. So this is a much wider query and it uses three different DMVs but it gives basically the same results. It’s just that you now have the create index statement there so you just need to copy and paste it.

So there’s my entire index statement including a good name for the index. Going further on, we’ve got things like average user impact so that’s percentage average benefit that user queries could experience if this missing index group was implemented. You’ve got average total user cost so that’s the average cost that could be reduced by the index. So you have to go okay, this might be a big saving 98%, but if we’re saving not that much is it really worth it. But this is when this a query like this could help because it actually orders by those indexes which are most beneficial. So a quick look at what all of these different DMVs are. So they all start Dmdb missing index. So details give you the details groups that returns information about indexes that are missing from a specific index group.

And then we have group stats. So this is summary information about groups of missing index and you’ve also got Dmdb missing index group stats query which returns information about queries that are missing in index. So this is probably the biggest query construct modification I can think of actually create indexes as and when they are needed. Of course, don’t create too many because as soon as you insert additional information, update or delete or merge, then the indexes need to be updated. So that could grind your system to a halt in terms of actual changes that could be needed to the query, where we’ve already gone through everything that we need to be Sargable.

So we need to ensure, for instance, we’re not using functions when we can avoid it. So don’t use the year function when you can use between two sets of dates, don’t use left when you can use like and don’t use the is null function. So if a certain field is null, then give me this unless you’ve got a specific need for it. Instead you could say if my field is null or my field is equal to whatever. Yes, it adds a bit more to the words, but it makes it actually savable, which means that you can use any particular index that are available.

So you don’t necessarily need to know all of the specifics of how to change a query to make a Sargable query, you just need to know, for instance, you shouldn’t use functions when you can avoid it if there is a better alternative that can use an index. So the greatest thing you can do to speed up your queries is to have appropriate indexes and you can do that by using sysdmdb, missing index details and similar DMVs.

5. 65. assess the use of hints for query performance

In this video we’re going to have a look at the use of hints for query performance. So what I’m going to do is we currently have this table, which has expanded to 16,384 rows. So I’m going to drop this table and then recreate it from the original. So we’re back to 32 rows, so our queries will be a lot quicker. So, as a reminder, we currently have an execution plan with a hash match. But suppose we didn’t want a hash match, suppose we wanted another type of match, say a merge join. Well, we can do that through the use of query hints and they’re fairly easy to use. We write option and then in brackets the hint. So in this case, merge join.

Now, notice what we currently have, two scans and the hash match. So let’s run this merge join and have a look at the execution plan. So you can now see we’ve got this scans and these sorts. Now, remember what I said about sorts, they take a lot of time, they’re very costly, so you do need to be aware of that. But because of these sorts, we now have this merge join. Now, was this a good thing to have done? What I’m going to do is I’m going to duplicate this query. So now we’ve got one using a merge join and one using the previous hash match. And you can see that the one with the hash match is about twice as quick as the one with the merge join.

So the first one only uses 35% of the total batch of these two statements, whereas the second users 65%. Now, if you have a look at the Microsoft documentation regarding hints, because SQL Server query optimizer traditionally or typically sets the best execution plan for query, this should be a last resort for experienced developers and DBAs. So we have got an awful lot of query hints. Now we’ll be looking at some of them later on in this course when we have a look at intelligence query processing. But I just want to have a look at some of the main query hints rather than go to all of them because you won’t need all of them, you just need to know the main ones.

So we’ve already seen that you can have option merge join. Well, we can also have other joins, we can have a loop join. So let’s put that in and see how good that is. And you can see even worse the loop join, but it is possible to do so. We can see this table spool. A lazy spool is an additional thing that has to happen. If we wanted to force the hash join. We could of course just write hash join. We’ve also got similar things for groups and unions. So we’ve got hash group, order, group merge, union, hash union and concatunion.

Different ways of doing groups and union. But again, don’t do it unless you absolutely have to now what else do we have now instantly if you are putting in more than one option, you can do that. So you can have a comma in between options. Now keep fixed plan. Now this query won’t be recompiled when its statistics change, it will only be recompiled if the schema of the underlying tables change. What’s the schema we’re talking about the columns or if you actively run it procedure called recompile that would also change the plan. So once it’s done, it’s fixed. Sounds too much. Well what about key plan that recompiles less often when statistics change?

So statistics being for instance, you’ve got 10,000 rows for where a particular sales order is equal to one, but you’ve only got 100 rows where it’s equal to 100 and you’ve only got two rows where it’s equal to 1000. So that’s statistics. But suppose I delete all of where sales order ID equals one, where there’s no longer this 1000 or 1 million rows that we’ve got there. So those are statistics. We can also optimize for unknown. So this uses the average selection of the where rather than a specific thing. So I might run this query where the parameter is sales order ID equals one and so we have got these million rows. But I would say I just want to do the average number of rows that it would be.

And then another one that I want to talk about is robust plan. So this creates a plan that works for the maximum potential row signs. So in this case using the million rows, if you are running a query that doesn’t contain this million rows then performance may be impaired. So we’ve got keep fixed plan which doesn’t recompile at all unless something major happens. Keep plan which recompiles left often optimized for unknown which uses the average selectivity the average number of roles and robust plan which uses the maximum number of roles. And then if I was creating a store procedure so create a stored procedure, so my procedure and I have an input for the sales order ID, and I use it here.

Then I could say optimize for. So I can say create this plan where a particular parameter is a particular value. Or I could equally say where it is unknown. If you don’t do that, then the store procedure will be optimized in its first running and will keep that. So if I call this with a sales order ID of one, then it will be optimized for a million rows and then when I call it again with sales order ID 1000, well there’s only 1000 rows, 100 rows, it will still be using the same plan as if sales order ID was optimized for number one. That should have an equal sign there.

That’s why I’ve got these squiggles. So optimize for unknown quite useful if you want this door procedure to just optimize for a particular parameter with unknown values. So these are table hints so we can use them, but only when absolutely necessary, I would suggest. So we’ve got merge join, loop join, hash join. Those are probably the more widely used hints but we’ve got hash group and order group. We’ve got various different types of union and then we’ve got keep fixed plan so the plan remains the plan until, say the scheme of the table changes.

We’ve got keep plan so that re optimizes or recompiles left often we’ve got optimize four so we can optimize for known or optimize for a particular value more often used in store procedures, I would suggest. And then we’ve got robust plan creates a plan that works for the maximum potential raw size. But it the major thing is don’t use it unless you absolutely have to because quite often SQL Server query optimizer does a pretty good job.

Comments
* The most recent comment are at the top

Interesting posts

Impact of AI and Machine Learning on IT Certifications: How AI is influencing IT Certification Courses and Exams

The tech world is like a never-ending game of upgrades, and IT certifications are no exception. With Artificial Intelligence (AI) and Machine Learning (ML) taking over everything these days, it’s no surprise they are shaking things up in the world of IT training. As these technologies keep evolving, they are seriously influencing IT certifications, changing… Read More »

Blockchain Technology Certifications: Exploring Certifications For Blockchain Technology And Their Relevance In Various Industries Beyond Just Cryptocurrency

Greetings! So, you’re curious about blockchain technology and wondering if diving into certifications is worth your while? Well, you’ve come to the right place! Blockchain is not just the backbone of cryptocurrency; it’s a revolutionary technology that’s making waves across various industries, from finance to healthcare and beyond. Let’s unpack the world of blockchain certifications… Read More »

Everything ENNA: Cisco’s New Network Assurance Specialist Certification

The landscape of networking is constantly evolving, driven by rapid technological advancements and growing business demands. For IT professionals, staying ahead in this dynamic environment requires an ongoing commitment to developing and refining their skills. Recognizing the critical need for specialized expertise in network assurance, Cisco has introduced the Cisco Enterprise Network Assurance (ENNA) v1.0… Read More »

Best Networking Certifications to Earn in 2024

The internet is a wondrous invention that connects us to information and entertainment at lightning speed, except when it doesn’t. Honestly, grappling with network slowdowns and untangling those troubleshooting puzzles can drive just about anyone to the brink of frustration. But what if you could become the master of your own digital destiny? Enter the… Read More »

Navigating Vendor-Neutral vs Vendor-Specific Certifications: In-depth Analysis Of The Pros And Cons, With Guidance On Choosing The Right Type For Your Career Goals

Hey, tech folks! Today, we’re slicing through the fog around a classic dilemma in the IT certification world: vendor-neutral vs vendor-specific certifications. Whether you’re a fresh-faced newbie or a seasoned geek, picking the right cert can feel like trying to choose your favorite ice cream flavor at a new parlor – exciting but kinda overwhelming.… Read More »

Achieving Your ISO Certification Made Simple

So, you’ve decided to step up your game and snag that ISO certification, huh? Good on you! Whether it’s to polish your company’s reputation, meet supplier requirements, or enhance operational efficiency, getting ISO certified is like telling the world, “Hey, we really know what we’re doing!” But, like with any worthwhile endeavor, the road to… Read More »

img