Alrighty...
So it's been a bit since I
last posted on here and wow, we have covered quite a bit of material in the last
few weeks.
It doesn't look like much when
you list them out:
- Advanced Star Schema Design
- Data Quality Analysis
- Dashboard Design and Analysis
- Web Metrics
- Google Analytics
But... these can get very detailed. In fact, you could probably spend are career just focusing on any of these.
Advanced Star Schema
Design
So, now that we have submitted
the homework and there has been a little time for the material to sink in, the
design process is making more sense. Like I said before, transitioning mentally
from trying to make everything 3NF and even 4NF to "just clump it all together"
takes a lot of effort. Also, truly understanding what you are trying to model
and get results for was a bit difficult for me when doing the homework. I think
I was over-analyzing the process and just making it too difficult when a simple
approach would have sufficed. Going through the homework was a great learning
experience on “simplification.” Fortunately, I work in an environment where
there is quite a bit of unprocessed data, and when I can finally take a break
from school I'll look in to how I would design the schemas for ingesting in to a
data mart/warehouse. Too bad I can only get to in on special networks.
Data Quality Analysis
That star design stuff leads
right in to DQA. The premise is pretty much a no-brainer: garbage in = garbage
out (the actual process on the other hand is not a quick wit response). The
designs for collapsing all the operational data tables in to a "small" set of
fact and dimension tables may be great, but if the data in those tables is not
consistent, the reports are never going to produce valid results and will always
be questionable. So before pushing the data, it should be analyzed for
inconsistencies (data profiling) and cleaned up. This could be a very
time-consuming process if it had to be done manually, even for relatively simple
data sets. Fortunately, there are programs/tools that are designed specifically
for this task. A good thing, especially when one considers that there are
organizations will millions if not billions of pieces of data and the profiling
and the cleaning process may need to be done numerous times before the resulting
set is deemed a high enough quality for ingestion.
Then once the data is all
tidied up and anomalies handled it can be loaded in to a data mart/warehouse.
Cool! (yes, geeky)
Dashboard Design and
Analysis
Now that all that operational
data has been collapsed, cleaned up and loaded, it’s time to do something with
it. What? This class is “Business Intelligence” so it makes sense that we would
go over how to extract useful (intelligent) information from all that data and
provide it to some business-type folks that can use the results for making
decisions. This is where dashboards (and analysis of the information they
display) come in to play.
The theory is, a dashboard
should provide a quick-glance summary of some data set (facts and dimensions)
and provide meaning to the business. It should be simple and not require any
cross-referencing or lookups to understand. Not much different than a dashboard
in a car, for example, speedometer, gas gauge, odometer, maybe engine
temperature and battery charge. Granted, a business dashboard would have a
little more, like graphs and summaries, but this is the general idea.
The key is figuring out
exactly what data to pull for displaying. This is where a dashboard designer
must not only understand the dashboard design tools and underlying data, but the
target audience. In many cases, the same sets of data may need to be presented
with different perspectives to accommodate for the audience focus. Network
engineers may want to know who is utilizing the most bandwidth and which sites
are being accessed most frequently, the finance department may only want to see
the costs per user or department for the leased line the internet connection is
coming through. And somewhere in the middle may be a manager who wants to see a
combination to determine who is costing the company money compared to their
productivity.
The visual piece of the
dashboard that I just can’t get in to is scorecards. Got it, understand their
use. I like to see them and use them. But designing them? I’m not the guy. It’s
not that I can’t be creative or “visual” (heck, I’ve been doing photography as a
hobby since I was a kid), but building pretty buttons and graphs for someone
else isn’t my thing. Maybe it’s lack of exposure, experience or need. Who knows,
maybe I’ll change my mind. I did with respect to MS SQL – swore up and down for
years that I would never do databases. Now… I work with MS SQL, PostgreSQL and
Oracle, so much so that I have been put on projects just because of my SQL
experience. So there is hope J
[Hmmm. Re-read what I’ve
written so far. If I didn’t know any better I’d say I knew what the heck I was
writing about. It’s definitely a good thing that this doesn’t have to be an
overly technical post. I do enough technical writing for work producing test and
deployment plans and supporting documentation and that tends to be a bit dry.
Nice to be able to write free-form for a little bit.
Now, time for a change of
technical pace and a discussion of the latest topics we’ve covered.]
Web Metrics
In going through the reading
material for this portion of the latest module, I’ve (re)learned that even
within the IT community each segment has its own language. Exit and bounce
rates, conversions, visitor metrics, demographics, order values and campaigns:
this is obviously where business and sales has influenced IT. Brand new
perspective for me. I’ve provided technical expertise in pre-sales, writing
proposals and statements of work and executing contracts, but never had to work
directly in this area. Fortunately, the concepts are easy to comprehend. There
are endless examples on the web to look at in this context; virtually any
company trying to get visitors to buy goods, download content or fill out
information. Just like brick and mortar companies, it’s all about numbers and
analysis: who is visiting, why are they visiting, what are they looking for,
when are they visiting and where did they come from? Once there is an
understanding of the answers, actions can be taken in terms of marketing; who to
target, how to get their attention (and business) and when is the best time to
do so.
Google Analytics
This portion of this latest
module has been one of the more enlightening, dare I say fun, topics so far. Not
completely sure why. Maybe it was because I was able to see all the previously
taught material put together in a usable, coherent package. Maybe it’s that I've
been allowed to view real data from a live system and slice, dice and drill down
to my own interactions (I’m pretty confident that the site visit in January from
Afghanistan was me and I actually captured my real-time activity, see image below). I was unable
to convince any of my local businesses at home to let me access their Google
Analytics data for my homework, but I (along with other students from the class)
was granted permission to look at the MISonline data. I was initially unhappy
with this as I wanted to be able to provide some sort of benefit to a company
that I do business with, but in retrospect, looking at MISonline has been more
beneficial to me. Being able to view how other people have interacted with the
site and how my own interactions affect the cumulative data is much more
educational.
Anyway, I think I have rambled
enough about school. Although, I will point out that the next module covering
social network analysis could be very interesting. I am presuming that
mathematical modeling is somehow involved. We’ll see…
No comments:
Post a Comment