U.S. Department of Energy - Energy Efficiency and Renewable Energy
Building Technologies Office
Restaurant Energy Performance Evaluation: How-To Guide and Spreadsheet Webinar (text version)
This webinar, presented by the National Renewable Energy Laboratory (NREL), addressed the difficulty that restaurant building owners and operators have in prioritizing energy-saving building upgrades within their building portfolio by presenting a performance evaluation how-to guide and spreadsheet toolkit developed by NREL. This webinar provided an overview of these tools, which help owners prioritize use of capital resources for cost-effective energy-efficiency measures.
Below is the text version of "Restaurant Energy Performance Evaluation: How-To Guide and Spreadsheet," originally presented on November 16, 2011. In addition to this text version of the audio, you can view the presentation slides and a recording of the webinar (WMV 13 MB).
Welcome, and thank you for standing by. At this time, all participants are in a listen-only mode for the duration of the call. Today's conference is being recorded. If anybody has any objections, you may disconnect at this time. Now, I'd like to introduce your host for the day, Ms. Michelle Resnick. You may begin.
Thank you, Chandra. My name is Michelle Resnick and I'd like to welcome you to today's webinar titled, "Restaurant Energy Performance Evaluation: How-To Guide and Spreadsheet." This webinar's presented by the Commercial Building Energy Alliances Program at the US Department of Energy. We're excited to have with us today an expert who helped develop a performance evaluation how-to guide and spreadsheet toolkit to identify poorly performing buildings. But before we start, I have some housekeeping items to cover.
First, as was mentioned, everyone today is on listen-only mode. We will have a Q&A session at the end of the presentation. You can submit your questions throughout the webinar electronically by clicking on the Q&A link of the top bar of your screen, typing the question in the box and clicking "Ask." Please be sure to click ask and not the symbol of the raised hand. Our speaker will address as many questions as time allows after the presentation.
And now, to introduce today's speaker. Today's speaker is Kristin Field from the National Renewable Energy Laboratory. Kristin specializes in whole-building energy simulation and collaborative projects with industry partners, including commercial building owners in the retail, hospital and other sectors. She will present an overview of these tools, which help owners prioritize use of capital resources for cost-effective energy efficiency measures. And with that, I'll turn the presentation over to Kristin.
Thank you, Michelle. Hi everyone and thank you for participating in this webinar. This webinar, as Michelle mentioned, is about a restaurant energy performance evaluation, how-to guide and spreadsheet. There are two products that have been made available through Department of Energy and I'd like to present on them today to sort of help people understand what they are and how to use them.
So just a kind of quick overview, what we are talking about today is the how-to guide and the spreadsheet. There are resources for comparing restaurant energy performance to that of similar buildings within a portfolio. So one important thing to note about these tools is that they're intended for a portfolio of restaurants, not for one individual restaurant and a lot of people that we worked with through the Commercial Building Energy Alliance do have portfolios of restaurants and have significant numbers of restaurants in their portfolios and so it was kinda designated towards them. You could certainly take this analysis and sort of tweak it to look at an individual building but you would really need to do a lot of thinking about that. It was mostly intended towards people with kind of a population of restaurants.
So why did we do this? Basically, there's a lack of benchmarking guidance for restaurants. A lot of different commercial buildings have some benchmarking guidance from different tools out there. One of them would be the EPA Portfolio Manager which uses CBECS data, that's Commercial Building Consumption Survey, I believe and that's a large survey of about 5,000 buildings. The problem with restaurants they looked at that – the data were too scattered and so they couldn't draw strong enough conclusions to include it in that tool and so the restaurants got left out. So with there being kind of a void for restaurants knowing how much energy their buildings are supposed to use, the question is if you don't know how much energy a restaurant is supposed to consume, how do you know when it's consuming too much? And if you're wanting to save energy, you need to identify the ones that are consuming too much. So this tool is meant – or, I guess couple of tools – are meant to help the restaurants with prioritization. So let's stay that you know that Stores 1 through 55 are using way too much energy, you wanna sort of direct your resources there instead of thinking, "Well, I have 1,000 stores," – or, you know, I say stores, restaurants. "I have 1,000 restaurants. I can't direct all those resources so I don't really know what to do. I'll go with some other criterion besides energy savings to decide what I'm gonna retrofit." So this is intended to kinda help identify which stores might be priorities for energy-based retrofits.
And then there's the how. How do you, you know, do this sort of retrofit prioritization. The document outlines this process in steps. There's ten total steps and we broke it into two types, I guess, of steps. One of them is steps one through six. That's what we call a high level evaluation and the reason for that is that some restaurant owners or operators, whoever's doing this, are not gonna feel a need to go beyond that and so we didn't – the steps seven through ten include linear regression and for some uses, I think that would just be overkill and so we didn't want someone to look at that and say, "Okay, well we can't use this entire process." There's kind of a basic one that you can use from one through six and then if you feel that you have the resources, time and understanding, you can go through ten.
And the spreadsheet took can kinda get somewhat started. We have the how-to document but then the spreadsheet actually takes real data and puts them in there so that somebody wouldn't have to create a giant spreadsheet from scratch. You can take ours and modify it. So that's the overview and we have some flow charts in the report and I just thought they were useful to show here, too.
This first flowchart is so big that it's broken up. So basically, you go through step one. There are some also – instead of just the steps there are some decision points in there. So you have step one, you gather your data. Are there 365 days? If not, you adjust to make it a 365 day year and for all of these steps, I'm actually gonna go into them in more detail so that's why I'm rushing through them.
Step three; you gather your building information. Four, you gather your weather data. Going to the next slide here. And then there's a big decision – could there be significant differences between building categories, meaning you have different menu types. You know, some restaurants, chains I guess, have a few different menu types and evaluating them together could – a lot of times is not – it doesn't make sense, basically, because they have very different cooking equipment that may be operates on really different schedules, so if you tried to have a strong correlation between predicting how much energy they'll use, it would confuse things. Also there's whether or not they include parking lot lighting on the meter; is the store stand-alone, meaning that there's no other buildings around it or rather, I guess, tenants around it. Are they in a strip mall instead? So you can kind a see through this, actually and this presentation is gonna be posted on a website so I think I'll kinda let people peruse that, you know, on the website when they're done but just basically, I guess, could show you that you have all the five – the ten steps explained and then these flow charts show you how they all fit together.
So the next slide shows – we separated steps seven through ten out onto a different flow chart since that's sort of an optional process after one through six. So this flow chart number two just shows steps seven through ten. There is – I can sort of call attention to the octagon up to the right that says, "Stop." so that one is in the middle of sort of the linear regression steps where you look and you say, "Are your R squared values high enough and I'll discuss that a little bit later but what it's basically saying is, "Do you have a good enough correlation?" Are you satisfied with it and if the answer is – actually, if the answer's no, then you could say, "Okay, well basically is that extremely important? Are highly accurate predictions needed?" If the answer is no, then you can go ahead and finish but if the R squared values are really low and it's important to you that they be high, then you would follow the no/yes paths to the octagon and if that sort of triggers the conclusion that more advanced methods should be used, because, you know, you probably need a commercial statistical software package or maybe you just need someone to take a look at the spreadsheet and really think about segmenting your data differently but anyway, that's just sort of an indication that these are some simple methods and maybe you need to go beyond, if it's important to you.
So that is what that is. And so with that, I'm gonna start with the high level evaluation portion, which is one through six. So step one, gather your raw data and, you know, data. That can mean just about anything, so what we're meaning by that, really, at this step is utility consumption. That's mostly electricity and natural gas for most people – could be propane. You could collect water and sewer. In our example spreadsheet, we don't do anything with that information but maybe you would end up liking to track that or predict your usage, so just included it in there as something that someone might be interested in. You could also select information about your cost but again, we don't do anything with that in our example spreadsheet, so, you know, that would just be if somebody wanted to sort of extend it for their own use and what you see on the image there to the right is a snapshot from our example spreadsheet so the right two columns, annual electric and annual propane or natural gas – that's where people would paste that information.
Step two; you adjust your dates to a 356-day year. This just basically – it standardizes the analysis period. You know, most utility records are not kept in a 365-day period and so it's just nice to make sure that everything – basically, you're comparing apples to apples that you don't have one step that's 365 days – the other set is 370, the other set is 360. You know, depending on how much energy you're using, that may or may not be a big deal but it's, you know, it's not hard to do. Our suggested way is just take the first and the last month from your year, be that January or December or two different months if you're not on a calendar year. Take the average daily value and if you're five days over, subtract five of those. If you're five days under, add five of those. Something simple like that is sort of our suggested method. There are other methods that you could use.
Let's see. Step three; you gather building-specific information. Again, that's kind of vague, so I'll tell you what we're talking about. Basically, an example, there were data that you could be relevant to the performance of the restaurant building and in the spreadsheet example, we used transactions. We normalized them for anonymity because that is actually real data that you see in there. Hours of operation, floor area, so and there could be others that maybe are more specific to your group of restaurants and then I did wanna call out on the bottom two bullets, so this gathering of this information, it's mostly used in the advanced analysis, when you create benchmarking equations but it's still useful to gather up front. It could potentially be used in Step 5a, which you'll see is separating data into categories, so the example didn't do this but you could separate data into let's say 18-hour-day stores versus 24-hour-day stores or something like that. So basically, those transactions, hours and floor area could become categories instead of benchmarking equation variables. So if that doesn't make sense now, it probably will later.
Step four; gather weather data. Now that may seem like, why isn't that just included in data? The reason why it's separated out is because this can actually be kind of a difficult and time-consuming part of the process, so I wanted to call it out separately. I know the annual weather data for each location, so if you have a portfolio of 1,000 stores and maybe there's, you know, 800 locations or something maybe you have some major cities that have a lot of locations in the same weather area, still, 800 weather locations can be pretty challenging, particularly if you want to collect information for – let's say you're looking at 2010 and the next year, you're looking at 2011 – sorry, we're in 2000- next year, you're looking at 2012, etc. If you have to keep downloading these 800 stores, you might wanna – either you can develop some kind of a script for doing this or a thing that we did in the example was to look at sort of normal degree data, meaning that they're normalized over 30 years. It's less time-consuming; you only have to do it once. The downside is that it's not as accurate because, you know, the typical 30-year data could be very different than the particular year that you're looking at, so you're kinda rolling the dice, assuming that your year that you're in is similar to the 30-year data, especially these days. I think the later years are starting to look more and more different from that.
And then I kind of jumped around the slide a little bit but suggested metrics – you don't have to use these letter metrics but there's what we used in the example and I think they're pretty good ones. HDD stands for heating degree day 50, 50 degrees Fahrenheit. CDD, cooling degree day at 65 degrees Fahrenheit. Sixty-five is typical, so you probably aren't questioning the CDD65. You might be questioning the HDD50 and that is sort of commercial kitchen-specific because the kitchens operate – they're so hot and they have so many internal gains from the equipment that the balance point temperature in them would be really different as far as at what outside temperature do you need to start heating and because of all the sort of radiant heat and basically even just the BTUs from the kitchen equipment, you wouldn't need to start heating until different temperatures. So anyway.
Step five is actually broken up into a few steps. It's separated into five steps. We have Step 5A that's separating your data into categories; 5B is preparing summary statistics for your raw data – and we say raw 'cause you haven't done anything to it yet except maybe separate it out; 5C, you're preparing histograms and scatter plots; 5D box plots; and then 5E, you've looked at all these things you've done in 5A through 5D and you try to remove outliers that are gonna mess up your correlations. You may need to iterate all of these 5A through 5D steps until your plots and your statistics show data sets with more linearity, less scatter – and less scatter would, you know, potentially give you better benchmarking equations or just a better fit, and fewer outliers. So anyway, you may need to do that. With the scatter, you know, you're gonna have some scatter, particularly in restaurants, in the restaurant building sector you're gonna have a lot. So some of you, I guess may be able to live with more scatter than others. That's kind of an individual decision – and what are you needing the information for as far as are you trying to predict down to the BTU what your next restaurant is gonna use or you just wanted to get a general picture of your portfolio.
So 5A, separating them into categories. So categories – what we mean by that, in the example, we separated them into – one thing we did was separate them into store types, which were stand-alone, meaning that it's one building. You see this a lot for retail, but you would see it in quick service restaurants also, or full-service. So it's just a building and there's no other, you know, I guess, store on the other side, basically versus a strip mall where you could be in the middle of five stores and so two of your walls are sharing heat with another sort of tenant. And then food court – I think most people know what that is. It's inside of a mall. The important thing there is that even in the stand-alone, your HVAC is kind of contained in one space whereas with a food court, it's not, it spills over into the general mall, which somebody else pays for. And so anyway, those are ideas.
Another sub dataset could be whether or not parking lot lighting is included in the use and that this – it may seem like an odd little thing to pick on, I guess, but you know a lotta places, especially quick service restaurants can be open for long hours and so then they want their parking lot lights to be on. It can be a really significant energy use and so if you have a bunch of restaurants that don't have that on their meter and then a bunch that do, you could kind of be scratching your head for quite a while trying to figure out why this other batch uses so much more energy and it' just something as simple as it has it's parking lot lighting on that meter and the other ones don't.
Menu type – I mentioned before. Let's say just as a hypothetical example, some of your restaurants cook fried chicken and some of them make tacos. You could have really different cooking equipment and it could be on different schedules. So it would make sense to separate those out into different groups of data. So that's when I say subdatasets, that's – the fried chicken would be one subdataset and the tacos would be another subdataset and other factors.
So – and just for people to know, you should try to have at least 50 stores in each of your subdatasets to have statistical significance. You know, if you have two stores in a set, that really – they're almost random, in a way, as far as the dataset. You can't really draw many conclusions from that. So you wanna try and have at least 50 stores.
Okay, so 5B, prepare your summary statistics and I did wanna emphasize here, this is a very basic look at the data. I know I had a colleague look at this sort of with a statistical eye and, you know, it was almost confusing that this was so basic that, you know, how can you – you haven't done anything to the data yet. You're just looking at max, min, median. This is just to get a very, very basic overview of what is the ballpark of what kind of energy consumption I can expect. There are questions like the mean and the medium close together? If so, maybe your data are kinda clustered tightly together. If not, then maybe they're skewed towards one way. Skewing is not necessarily bad but it just tells you something about your stores and the standard deviation. So these are just things to calculate to sort of help you get a handle on what you're looking at, I guess.
And then do they support the categories chosen in step 5A. I'm gonna go back a second. So let's say you chose – I'm gonna separate mine out into stand-alone, strip mall and food court. So – and then you look at your summary statistics – are the maximum and minimum reasonable numbers? Are you getting a reasonable standard deviation that looks like something you can work with or is the quantity of it the exact same as, I don't know let's say, is the standard deviation so large that it itself could be the conception for an entire store? If the answer to those questions are no, you might wanna go back and choose different categories.
5C, so these are the histograms and some other things I'll talk about next. These are also tools to sort of help you visualize what my data are, like basically what do they include? Are they scattered? Are they skewed? Do they have a ton of outliers? So, somebody, if they're gonna have to do all of these plot types, but we just feel like they're – it's good to do all of them to get a really good handle on what you have and what you're looking at with your stores. So anyway, one of these tools is a histogram. Shows you the whole distribution of the data. If somebody's not familiar with a histogram, it's just basically a count of stores within certain bins. So let's say on the top right one there, you have bins of kilowatt hours of annual electricity usages and we have the number of stores that fall in that bin. So the numbers are a little small to read but, you know, how many stores use, I don't know, let's say 300,000 kilowatt hours a year, between 300 and 325, and then a whole lot more stores use between 325 and 350. I'm not reading off the slide so that may not correspond with what's there, but anyway, that's what it tells you.
Step 5C, so more about histograms – basically, that first bullet is pretty much what I just said. On the X-axis where you have the bins of say, kilowatt hours in this case, you could choose non-symmetrical bins. You could choose them of different sizes but we recommend not doing that because it would be confusing so – but it's an option. And just as far as looking at the shape. I mean, you could always do it but then the shape is not gonna tell you the same thing as it would if they're equally sized. In general, what histograms can tell you – they can give you rough numbers of what to expect for most stores. So looking at this graph, this example graph here, I'm gonna look at it since I can actually read it now, and it's because that between 200,000 and 400,000 kilowatt hours a year, I could expect that for the vast majority of the stores and I could just quickly say that from looking at that graph. How many outliers – so looking at the graph, now this one, you know, well I guess, the scale on the Y-axis is enough that you could see that sometimes you'll have – and say we have 0 to 5 to 10 to 15. If you have 0 to 20, you might not be able to see it with one outlier but anyway. I think with these you can see there aren't too many – they're look like a few that are on the really low end that you can see on the graph but, you know, you could see if this was peppered with all these tiny little things at both ends, you could think, "Okay, this distribution might have a lot of outliers," just a basic first pass thing and whether the values are widely spread or tightly clustered. So if that graph had a lot of high values right in the middle and then dropped off really quickly on the right and the left, that would be tightly clustered. If it was widely spread out, then that could tell you, okay, well you, you know, let's say from 200,000 to 400,000 – if it was really tightly clustered I could say that most of them are gonna be at 300,000 but if it was spread really wide, I could say, no it really has to just be in that range. I can't make that call of that they're all gonna be at 300,000. So anyway.
That's just – these are kind of qualitative things that you're getting from these graphs. 5C; preparing scatter plots. Scatter plots – another way to plot your data and look at them – Y-axis, you have your say, kilowatt hours annual usage here and then for the X-axis for this example we chose, normalized transactions, you could put other things. You could put hours of operation or things like that. The conclusions that they drew was that transactions were the most likely to have the biggest effect and so that's why we chose to graph that but it's well worth it to play around with different ones for your own data and see what you come up with. So anyway that's what was on the X-axis for us.
And here's kind of what they can tell you. Again rough numbers to expect. You can – this is just another way to visualize it here but you can kinda see most of those dots do fall between – actually this is a different set of data. I was gonna say these ones look more like in between 300,000 and 500,000 but this is a different set of data than that other graph. How many outliers – I personally think it's a little easier to look at outliers on this type of graph but other people might disagree. But, you know, it's pretty easy to see a clump and let's see – yeah. And then just see little dots off to the outside. Those are the outliers, potentially. And does the X-axis affect value – so the transactions here seem to affect the Y-axis value. That would be basically the R squared we're looking at it how well does it fit for the line. This example – and we highlight it here example of low R squared just so that you wouldn't think because we put this in the example, that therefore it's a really good but it's exemplary as an R squared – .239 is not exemplary as an R squared. It's pretty low. In sort of statistics class you would learn that – well, when I took it, I learned that more like 90 or 95 is what you want. I've learned sort of in, I guess, more practical world, you might wanna have 70 or 80 percent or so. In the restaurant world, from what I've seen and what the collaborators with us have seen, you would be pretty lucky to get 70 percent. In our example data, we got – the best we got was 60 and actually, I think I mentioned that on a different slide but just as an aside.
But anyway, the R squared, if you do this with the Excel trend line, in Excel, it'll give you the R squared automatically and that's just sort of a – if you were to do that and it popped up with an R squared of 80 percent, you should be pretty happy with that. But if you pop up with 24 percent, don't despair, you could still draw conclusions from it.
Okay, 5D, here's another type of plot that could help you visualize your data. It's useful for displaying data in the middle of the distribution. Actually, it's also useful at the end. I'll get into that a little bit more later but it basically – box plots are especially useful for identifying outliers. So – and there's a specific way that they do it and they basically – so those red lines that you see, I'll sort of – there, you see two red lines and then you see some green boxes in the middle. The green boxes represent the inner cortile range which is 25th percentile to 75th percentile and a statistics textbook or something, you know, would tell you this. And the report actually references it in this link, National Institute of Standards and Technology that give you a good overview but to give you a quick one here, anything within the green box, that 25th to 75th percentile, and at the upper red fence – or, you would call it the upper inner fence, that would be the amount of that green box. So let's say high value green minus low value of green sort of the links, I guess, of it, or the height of it, that amount times 1.5 added to the top of the green box, that's how you come up with that red value. Again, this isn't something you should memorize from this presentation but just to give you an idea, that's how, sort of, the statistics of it works out and it's been determined that anything above that is most likely an outlier. And this is sort of a common way of displaying your data, statistically, I guess, and that that's just generally that sort of a rule is that anything above that upper range is probably too high – it's suspiciously high, I guess I should say. Anything below the lower one is suspiciously low. Does it mean that there's something wrong? Not necessarily but you should look at it. You'll notice with this dataset, there's a lot more, you know, "suspiciously low" values than suspiciously high. So that's just something to keep in mind. What that could mean is that this set of stores has a lot of really good performers. So that doesn't necessarily mean that they have a bunch of missing data. And I believe that's all I would wanna tell you about that graph.
So 5E, I mentioned before that steps 5A through 5D help you get a handle on your data and then 5E, you should have a good idea of what are your outliers and so in step 5E, you should remove them. And also the example spreadsheet, it highlights it on an inputs tab potential outliers in gray and it says this in the spreadsheet but, you know, so it helps you – anything above the upper fence or below the lower fence gets highlighted in gray automatically. So you can consider removing it but don't do it automatically.
Just in general, as a general note, outliers can be caused by either missing data – you know, sometimes accounting systems just lose a bill or they switch systems and some of them didn't get transported over or whatever. Sometimes you have too much, you have a double month and that could just be because they collected them as calendar months but let's say in January you had a bill on the second and a bill on the 31st and so then, when you summarized it it looked like January had twice the consumption than it should and February had none. You know, and so it's just simple stuff like that. You could have a problem with your meter. You can think of all kinds of things. You could have a renovation in your store so that, you know, that'll be something significant enough to make it look like an outlier versus just an inefficient piece of equipment. You doubled the size of your store; you had it open to the environment for two months because you were working on it – whatever it is, or something else unusual. So anyway, you look at your sort of suspicious points; you remove them or don't remove them and after you do that, look at your statistics again, to see if they've improved. So let's say if your problem was that you had an R squared of .39 here or maybe I should say .24 and then you sort of, you go back, you remove your outliers, maybe you re-categorize, you put things in different groups, you look at it again and your R squared went up to .39. So you've created a stronger correlation. You may want to iterate – you could have a number that you're comfortable with – or it could be that you keep iterating, you keep coming up with .39, so it's not really worth the effort. Maybe you should just get comfortable with .39. So that's kind of the process here.
And step six is to perform the evaluation. So basically, you changed your dataset as much as you want to. You've categorized it as much as you want to, now what can you glean from this is the idea here. First of all, are your store categories even significant? You have your stand-alone versus food court, for example. Look at your summary statistics, are they similar? Did you separate into those two – did the statistics look very similar for those? Then maybe the next time around, save yourself the trouble and just, you know, lump them into one group so there's a range of performance, max to min. I mentioned on another graph roughly 400,000 to 200,000 kWh per year – that is a very large range, of course, but, you know it just gives someone a ballpark of what are we looking for. So that if I have a store come in at 600,000, I don't have to think too much, I don't have to do an analysis, I can say, "Okay, well, there might be something really wrong going on here. I need to look into that." Whereas if something comes in at 350, you might think, "Okay, I'm gonna prioritize looking at the 600 over the 350. So that's what your average performance is.
Distribution is it skewed towards high or low? That's not necessarily a bad thing but it's good to know. You know, if you have sort of a lump of stores but a lot of them are towards the higher end of say the 200 to 400 range, then maybe 350 seems more typical than 250. Were there a bunch of outliers? Were there few? And depending on what the outlier reason was – if it was because of a bunch of accounting errors, then maybe that's just a one-time thing, you were transferring systems. Maybe it's, you know, showed some kind of an issue that you need to work out. Or if you have a whole lot of metering problems, you need to go fix that with the utility. So anyway, these are just the things that can tell you.
But then one of the crux of here I guess is what we're trying to get at is that once you've put your data into your categories, you've sort of done some really basic analysis on it – now you might wanna see, well which buildings are potential retrofit candidates 'cause that is sort of the purpose is to save energy. In order to save energy with existing buildings, you have to do retrofits, usually. And so you need to know – if you have 1,000 buildings, you're not just gonna automatically well, do, you know, I'm gonna go do an energy audit on 1,000 buildings. You'd rather do an audit on 20 buildings and so you need to identify well what are the, let's say, I don't know, 50 buildings I wanna investigate to choose 20?
The spreadsheet to help you out with this highlights in yellow the potential retrofit candidates – and actually, I'm gonna go back a few slides to describe this kinda graph. So one of the easiest ways to see that, I think, and this is where the spreadsheet gets it from, is that what we call the preliminary retrofit candidates are the ones that are above the green box so any data points that would be above that green box that's the 25th percentile to the 75th, but below the red fence – those would be the potential candidates so they use more than sort of average expected energy but not so much that the data point looks suspicious. So those would be the ones that we suggest that you look at first are the ones above the green, below the red, and they're automatically highlighted in yellow in that spreadsheet and this is also discussed in the report.
Okay and then you have a decision to make. After step six and let's say you've investigated these preliminary candidates, do you wanna go on to step seven and do an advanced evaluation and what you would do there is you end up developing benchmarking equations. You can predict energy uses given different operating conditions. I don't know if you wanna predict it for future years, maybe you're considering taking a bunch of your stores from 18 hours a day to 24 hours a day; you could use the equation to plug in more hours and predict the usage. But I warn you, strong correlations are not guaranteed. Just because you invest all the effort and come up with these equations, doesn't mean that you're gonna have as much confidence in them. Sometimes you need to adjust your expectations if you want to, you know, what my statistics teacher said in college that you need to have 95 percent – you're probably not gonna get it and so if that's gonna bother you, you just shouldn't do the analysis. But if you can live with a 40 percent or a 50 percent and just understand what that means; that what you're looking at is only explaining 40 to 50 percent of the variation, then you may wanna go ahead and invest this time 'cause it could be useful to have these equations.
So assuming that you did decide to go on, we're gonna look now at the advanced evaluation, seven through ten.
So step seven, which is also delineated into three substeps, that's basically about performing linear regressions. That was the method that we chose to look at those restaurants in sort of a more "advanced" way and I say these quotes because there really a lot more sophisticated things you could do, statistically, but someone who's, let's say a restaurant owner, a portfolio owner, an energy manager for a chain of restaurants, they're not statisticians. Even if they possess that skill, they probably don't have time to do that, so a lot of people just that's not enough of a priority that they can do a full-blown statistical analysis. So that was kind of why we chose to stop where we did. So as far as step seven, 7A is just identifying what your significant variables are; 7B, do the actual regression. We chose linear. I will sort of side note here. The people that we partnered with, I think did look at higher order regressions and the increase in quality wasn't enough to justify it, so we chose to stick with linear. So you certainly could do higher order ones but its like not recommended. 7C, evaluate the quality of your regression that you come up with. And the spreadsheet automatically completes this in the regression analysis tab so that's a pretty useful thing, you know, if you're using that example spreadsheet.
So let's see, 7A is identify the significant variables and what we're talking about here is in the example, transactions, hours, floor area, and weather, which is HDD50 and CDD65 for us. You could have other variables but these are the ones that we picked and we call them independent variables. They're basically like an X and a Y equals A or – yeah, AX plus B. These are like the axes, which ones might be significant in predicting performance. So if you think that floor area is not gonna have much of an impact, why bother collecting it for all of your stores, you know? You don't wanna choose every single thing. You sort of want to choose the ones that are most likely, unless you have some time. It's certainly interesting to play with different variables.
Performance by store type – so when we say performance, what we mean is consumption, in this case. We're not talking about demand or some other, you know, percentile. We're just talking about kilowatt hours and BTUs or whatever and that was kind of the Y variable in what I'm saying here.
7B; do the regressions. If you're using Excel or if you're using our spreadsheet, this is done automatically. You can certainly do it by hand if you want. We used Excel's LINEST function and the equation type, this is just a really simplified one just to show you what it means would be kWh is like the Y value equals a as a constant b – I say HDD50 slope – what that means, slope is sort of, for math, is change in Y over change in X so it's change in kilowatt hours in relation to change in HDD50 slope. So for every one unit of HDD50 you go up, what change does that produce in your kilowatt hours? So we just called it slope. Anyway, so that would be sort of a regular example of what linear regression's equations would look like.
7C; you've done this regression. How well does it predict performance? And, you know, I'll go ahead and sort of point to the, I guess go to the second bullet first. On the right there, that's a snapshot from the spreadsheet tool. You'll see that there are a lot of different combinations. The reason is that – well let's say at the top we combine transactions, weekly hours and floor area as sort of X as independent variables and looked at what effects they had but maybe it's better to look at only transactions and floor area or maybe only transactions. So we kinda did all the combinations to see which one gives you the best R squared and if, let's say, transactions gives you the best but it's sort of at a tie with transactions and floor area, well in one of those, you only have to collect one set of data so you might as well use the simpler one and just go with transactions. So it can kinda tell you things like that to do sort of the whole set of combinations.
I've put here high R squared greater than 70 to 80 percent. You also could use other statistical parameters, like P factors and F tests. If you wanna get into that, there's commercial software, there's your statistics textbook, you know, there's different ways, but I think most of you will probably just stick with the R squared. And yeah, so for the best results, you really want to do the combinations. If you really don't wanna spend that time and you wanna either use what we found was the best for our set of data, or you just wanna use one that includes all of them, you can certainly do that and you might say all of them, that's the top cell in this example.
Okay, so anyway, that's evaluating the regression quality. So how good were your correlations? Step eight, assuming that you think that the quality was good enough to do this, you just create your equations and this one, really, it's kind of a – well, you choose the, first of all, the strongest one. If you've done a whole bunch of different ones, sort of all the combinations and possibilities, the variables, you would choose the best one. If you only tried one, you use that one. So if you are okay with how strong the correlation was, then you just, you make the equation – and I apologize for the appearance of this. In the final version that gets posted on the website, it will be a cleaner version. This is some kind of an issue that we had uploading the files to Live Meeting, so apologies for that. Actually, I'm gonna go back to the other one 'cause it looks pretty much exactly the same as this equation here, this kWh, equals and then this linear equation. That's pretty much what it would look like, so you pick the best one of those with the best a, the best b, and the best c.
And if we could go back here. Okay. Yeah, and then that screen shot image down there is a screen shot from the spreadsheet tool and that just shows you how, let's say for store type A, your best set of constants plus slopes are given along that sort of top row. For store type B you have a different set. So you have different benchmarking equations for each of A, B, C and D, which makes sense. Your store type A is gonna have different set of predictions than store type B – that's the reason why you separated them out. If they're the same, then you probably shouldn't have separated them out.
Step nine, so you have your equation. This is kind of a very simple step but, you know, you have your Xs, plug them in and see what your Ys should be, your expected Ys. So when I say X, that's things like transactions, floor area. You have your actual known operation characteristics that you collected data for over the whole year. Plug them in and see according to your equations – and I'm just gonna remind you all that this equation was based on your whole subset of data, so if I have restaurant 123, this is not based on only the performance of restaurant 123, it's based on the entire dataset. That gives me the equation that predicts how much both restaurant 123 and restaurant 345 and restaurant 567 or whatever you wanna call them – that equation predicts it for the whole set, given each individual set of Xs. So just to be clear of what we're talking about here.
So basically, you know, plug and chug in step nine. Step ten; compare. So you have – you've predicted what you think – how much energy you think that each store should use, now how much did it use? The spreadsheet tool gives you a table for this. You can easily create it yourself, too but, you know, you could either compare how many total, let's say, kilowatt hours or therms did I expect, I expected the store to use? It ___ therms and instead it used 60, though, it's just a wide example. It's probably easier or more useful to compare percents. So, you know, how many percentage points off was my expectation from what the performance was?
General guidelines – and these could vary from data set to data set, so from the data that the report looked at, these were some good guidelines and they matched it. You could find, you know, let's say the five percent number is more like eight percent for yours but this is a general guideline. If there's less than five percent variation from the expected versus the actual, that's probably in the noise. That's probably not a big deal. Five to ten percent, note it. it could be something but you might not need to worry too much and then if it's more like 10 to 25 percent – and this is individual restaurants we're talking about – consider an audit or just look into it, don't just automatically audit it because you might have these stores that fall into that category unless you have those kind of resources. What that means is look into them, consider do I need an audit or not. It could be that they have a very valid reason for using more energy, you know, either they're a subtype that – a store that would've given you less than 50 in its subtype, so you lumped it into a bigger category but it really is a little bit different. Let's say, it just has a different set of cooking equipment, but it has like a few more menu options than most of the stores but there's only ten of them that like that and maybe it's a pilot thing you're trying. So you can't have them as their own set but they are a little different than the greater set that they're in. So in that case, there's a good explanation for it. You don't need to go audit them; you already know what's going on. You know, or you look into it; everything seems to make sense except that they use more energy, you may wanna audit them. Maybe they have an old piece of equipment that's using way too much energy. Maybe someone is leaving all the equipment on at night and not turning it off and, you know, they're wasting energy.
So anyway, and then greater than 25 percent variation, you should really sort of explore the data errors. That's pretty unusual. So I mean, it could be a reasonable explanation but probably it's an error, so those are just general guidelines there. Let's see here. That was the advanced evaluation, so let's say that you've gone through steps one through ten. You've looked at your data. You've divided it up; you have your benchmarking equations; you've seen your actual versus expected – now what? Basically, well, one thing – if you have more than 25 percent variation, you might wanna remove them and iterate your calculations to see maybe your R squared's improved. So that's just kind of going back and refining. Look into those stores with that 10 to 25 percent variation. As I was kind of saying earlier, actually, these points were what I was talking about earlier. Maybe they're mild outliers. Maybe there was a decent explanation or maybe they indicate that that store is using more energy than it should and it warrants an audit or at the very least a phone call to the manager. Any – and so that could be a next step.
And then you might have some customization of the spreadsheet tools that would meet the needs of your portfolio better. Maybe, you know, they have store type A, B, C and D; maybe you need to have more store types. Maybe you need to have different graphs displaying and maybe you have a favorite statistical, you know, methodology for looking at things and you need to sort of adapt it to where it would produce the graphs or equations or whatever it is. So any necessary customization – that could be a next step if you're interested.
And then another step that we'd like for you to do is to give feedback to DOE or NREL about this process; about if it helped you – especially if it helps you decide to pursue some retrofits based on energy and actually saving some energy but also if you tried it and if your conclusion was just that my data are widely scattered and this only told me that my data were widely scattered and now I spent three hours and I don't have anything else to show for it. That's feedback, too. I mean, we would, you know, hopefully we get the former, but we would need to know, especially if you have maybe any ideas on, "You know what if your tool did this instead, this really would've helped me a lot more." So that would be sort of another important next step.
I also would like to have some acknowledgements. I mentioned them up front but NREL gratefully acknowledges the contributions of the following individuals: Vernon Smith, Roger Hedrick with Architectural Energy Corporation, Rachel Romero at National Renewable Energy Lab, and also members, we say individuals, all members of the restaurant – REA Restaurant Project Team. Really helped – they contributed not only some data but some know-how, a lot of information and sort of, I don't know, you know, basic testing of these ideas and what would be useful, so I'd like to thank them for helping shape what this is. And basically, I can just say here's my contact information. We mentioned feedback to DOE or NREL. Here's one way to do it is you can e-mail me Krisin.Field at nrel dot gov. If you know other people; if you have a relationship with someone that's been working on this project and you don't have to contact me but there's a way to contact if you need one.
And I think with that, we can turn it over to Michelle for the question portion. Thank you.
Yes. So right now we wanna encourage you to go ahead and submit your questions through the Q&A link at the top bar of your screen. And our speaker will answer as many as possible. I think we've only gotten one thus far, so if you have questions, we ask you to ask them now.
Okay and I – okay.
Yes, go ahead.
I do see the one question and I can answer it. The question is what does the constant a equal in step B? So I think that is step 7B, I believe? I'm gonna – just a second here. Apologize – I have to go through these one by one. Yes, okay, so it looks like they were talking about step 7B. The constant 'a' – so if you had zero heating degree days, say 50. You had zero floor area – which is not very likely but or, you know, yeah, let's just pretend zero floor area, then your kilowatt hour usage would be equal to 'a'. So it's just – it's the portion of that that's not dependent on any of your other variables. So it's kinda like in a linear equation, it's the constant. 'A' is the constant; it doesn't change, no matter what your HDD50, hours of operation, anything are. So I hope that answered it. That's what that means.
All right and since we just got one additional question, I'm just gonna go ahead and read it, make it a bit easier. You know, "What type of recommendations do you give high-end users?" was the next question.
Okay and recommendation for high-end users – so I'm gonna guess that – I guess I'm wondering what a high-end user is; if that's a user that has like, someone that has particularly high use sort of end uses with the restaurant or if that's high end, a very sophisticated. Let's see. I can answer the first one, I guess. Or sorry, the second one that I just mentioned would be if you're very sophisticated. I would recommend go through the whole thing, go through all the way to step ten and see, I guess, if you're satisfied with it; if you feel like you've gotten what you needed as far as understanding your portfolio of restaurants. Understanding which ones you need to focus on. If you haven't, then you're high-end meaning you're sophisticated, look into commercial statistical software, unless there's sort of someone on your staff that's very statistics-savvy – look into that as sort of a next step.
And let's see and then I guess the other possibility for what the question was was restaurants that have high-end uses, I guess. So I'm thinking maybe very intense kitchen equipment or something. What would I recommend for them? Let's see well, you know, I think actually most restaurants are like that. I've heard them sort of likened to little labs, you know, that you just – in a tiny little space, you have this very intense equipment. So, you know, I'm trying to think. I guess there is a difference for what you'd recommend for sort of a light kitchen, let's say a place that prepares salads or something like that, depending on if they reject the refrigeration outdoors versus a place that cooks burgers or is frying all day or something like that. The place that is cooking the burgers and frying all day versus the salad place, I mean I guess as far as what I'd recommend – trying to think. I don't think I'd have a recommendation that would be different for them than for the other. I think for all of them, I'd recommend going through the analysis as opposed to the lower intensity kitchen equipment users. You guys might have an easier time getting good correlations, I think. I think it's the ones with the intense equipment that have more trouble. And some of it could just be that let's say a different operating hour or even a different manager deciding to turn this off one hour after closing or maybe they stay two hours – if you have a lot of intense equipment, that makes a big difference. So there could be little things that are hard to know and to separate out and with every hour that you have that equipment on, it's significant.
I don't know, that's just kind of a general answer but I think really, I would recommend for all of you that you at least go through one through six if you're able to. Okay.
All right and then we have another question and it's that if any of this information is available online. I can tell you we will be posting a copy of today's slides and a video from today's webinar on the Commercial Building Energy Alliances website on their Webinars page. But as far as plans for the spreadsheet tool and the reporting, maybe you can address –
Yeah, so they're actually both online –
- Okay, good.
- And the locations are in the overview slide, which is slide two. Yes, so I suppose if we don't have any other questions, I'll go ahead and go back to it but you've seen – it's a little slow. But yeah, as Michelle mentioned, this presentation is available online and in the presentation, you have a link to where the spreadsheet and the report are. So basically, yes, they are available online. And the tools, you'll download it, save it as a different name and start playing with it. That's, you know, and feel free to give us feedback on it, though. That'll be the way to really know how well it's doing and how well it represents what you guys need.
If there are more questions, we encourage you to submit them now. We'll give it another minute or two while we go back to the slide.
Okay, there it is.
All right. I'm not seeing additional questions. So at this point, what we'd like to do is go ahead and wrap up. We'd like go ahead and thank Kristin for her time today and we'd also like to thank all of you for participating. And as we were just discussing, slides from today's presentation and a video of today's webinar will be posted in the next few days on the Commercial Building Energy Alliances website. If you go to www.commercialbuildings.energy.gov/alliances, there's a link there to the webinars page and there's also webinar archives page and that's where those will be posted in the near future. So we encourage you to go ahead and download those. If you have any trouble finding them, we also have a contact link on the site, so we'll be happy to send those to you and if there are no further questions, thank you so much for your time and have a great day.
Yeah, thank you.
Thank you for call- thank you for participating in today's conference. You may disconnect at this time.
[End of Audio]