Sunday, March 9, 2014

Have You Checked the Network?

Finally.

It's taken 20+ years, but I'm finally seeing some of my undergrad studies in action in the real world.

Well, that's not entirely correct. I use the programming, analysis and process oriented skills pretty much every day. But I've never applied the actual course material. Really, how often do you look at Eigen vectors, Cauchy-Reimann equations, Fourier series or planetary life-cycle modeling as an infrastructure consultant in the IT world? Yep, never.

So... I majored in math (by choice). The astrophysics minor was a fluke. I wanted to minor in English but the College of Arts & Sciences policies said otherwise. I was required to minor in a related field so I took some classes I thought would be beneficial and figured on getting physics. The astronomy classes were literally for fun (and to hang out with my very geeky friends) with no expectation that they would be "useful" towards my degree. But before I graduated they were reevaluated and renumbered. I didn't know I had the split Astronomy/Physics minor until just before graduation when I completed by degree check paperwork. Turns out I did become a rocket scientist after all :) I digress: back to the topic at hand.

One of my core classes was discrete mathematics in computer science. We covered logic, proofs, theories, etc. The instructor tried to make it interesting and relevant, which was difficult with a room full of bored undergrads with other endeavors on their minds (in my case, Z's Pizza & beer with friends, followed by more cold beer at Gentle Bens). I don't recall much of the "computer science" portion as we never used a computer in class, which was a bit strange. The infrastructure on campus was fairly robust for the time period. Open labs were all over and students had access to GAS, VMS and other systems for research, class projects, email and even surfing. Quite a contrast to the outside where only a small percentage of the population had "real" internet access from home (everything was pretty much dial-up and AOL was king). So with a small bit of knowledge about computer communications, etc., the ideas behind networks and connectedness started to make a bit of sense when we covered network & graph theory. At the time is seemed limited to computer networks and not much else.

Now on the other hand... Wow, what a change. With the vast amounts of data available, low-cost high-throughput computing (including distributed architectures), all the knowledge I brain-dumped years ago is actually useful. Social networking analysis, pathogen transmission and a myriad of other fields of study are relying heavily on the mathematics describing networks and providing models for prediction.

Very cool. Almost makes me want to go back to school in the math department... Well, maybe not.

So, as part of the course we've been provided the opportunity to do some of our own network analysis without needing to grind through the math. Good thing; I can't remember how to do any of the calculations. And, yes, it's pretty cool stuff. Being able to install a small software package, download a data set and start displaying relevant information in a short time (a couple of hours) without having to understand the mechanics under the covers is for lack of a better term, liberating. Much like the web analytics we did earlier in the session, the tools available for network analysis provide even a novice powerful information that can be used for making informed decisions. These can be as simple as figuring out who to contact for leads on sales or jobs or figuring out who is the key player responsible for dissemination of bad information.

Looking forward to experimenting with the tools some more to see what other nuggets of information can be found...

Sunday, February 23, 2014

Alrighty... So it's been a bit since I last posted...

Alrighty...

So it's been a bit since I last posted on here and wow, we have covered quite a bit of material in the last few weeks.

It doesn't look like much when you list them out:
  • Advanced Star Schema Design
  • Data Quality Analysis
  • Dashboard Design and Analysis
  • Web Metrics
  • Google Analytics


But... these can get very detailed. In fact, you could probably spend are career just focusing on any of these.

Advanced Star Schema Design
So, now that we have submitted the homework and there has been a little time for the material to sink in, the design process is making more sense. Like I said before, transitioning mentally from trying to make everything 3NF and even 4NF to "just clump it all together" takes a lot of effort. Also, truly understanding what you are trying to model and get results for was a bit difficult for me when doing the homework. I think I was over-analyzing the process and just making it too difficult when a simple approach would have sufficed. Going through the homework was a great learning experience on “simplification.” Fortunately, I work in an environment where there is quite a bit of unprocessed data, and when I can finally take a break from school I'll look in to how I would design the schemas for ingesting in to a data mart/warehouse. Too bad I can only get to in on special networks.

Data Quality Analysis
That star design stuff leads right in to DQA. The premise is pretty much a no-brainer: garbage in = garbage out (the actual process on the other hand is not a quick wit response). The designs for collapsing all the operational data tables in to a "small" set of fact and dimension tables may be great, but if the data in those tables is not consistent, the reports are never going to produce valid results and will always be questionable. So before pushing the data, it should be analyzed for inconsistencies (data profiling) and cleaned up. This could be a very time-consuming process if it had to be done manually, even for relatively simple data sets. Fortunately, there are programs/tools that are designed specifically for this task. A good thing, especially when one considers that there are organizations will millions if not billions of pieces of data and the profiling and the cleaning process may need to be done numerous times before the resulting set is deemed a high enough quality for ingestion.

Then once the data is all tidied up and anomalies handled it can be loaded in to a data mart/warehouse. Cool! (yes, geeky)

Dashboard Design and Analysis
Now that all that operational data has been collapsed, cleaned up and loaded, it’s time to do something with it. What? This class is “Business Intelligence” so it makes sense that we would go over how to extract useful (intelligent) information from all that data and provide it to some business-type folks that can use the results for making decisions. This is where dashboards (and analysis of the information they display) come in to play.

The theory is, a dashboard should provide a quick-glance summary of some data set (facts and dimensions) and provide meaning to the business. It should be simple and not require any cross-referencing or lookups to understand. Not much different than a dashboard in a car, for example, speedometer, gas gauge, odometer, maybe engine temperature and battery charge. Granted, a business dashboard would have a little more, like graphs and summaries, but this is the general idea.

The key is figuring out exactly what data to pull for displaying. This is where a dashboard designer must not only understand the dashboard design tools and underlying data, but the target audience. In many cases, the same sets of data may need to be presented with different perspectives to accommodate for the audience focus. Network engineers may want to know who is utilizing the most bandwidth and which sites are being accessed most frequently, the finance department may only want to see the costs per user or department for the leased line the internet connection is coming through. And somewhere in the middle may be a manager who wants to see a combination to determine who is costing the company money compared to their productivity.

The visual piece of the dashboard that I just can’t get in to is scorecards. Got it, understand their use. I like to see them and use them. But designing them? I’m not the guy. It’s not that I can’t be creative or “visual” (heck, I’ve been doing photography as a hobby since I was a kid), but building pretty buttons and graphs for someone else isn’t my thing. Maybe it’s lack of exposure, experience or need. Who knows, maybe I’ll change my mind. I did with respect to MS SQL – swore up and down for years that I would never do databases. Now… I work with MS SQL, PostgreSQL and Oracle, so much so that I have been put on projects just because of my SQL experience. So there is hope J

[Hmmm. Re-read what I’ve written so far. If I didn’t know any better I’d say I knew what the heck I was writing about. It’s definitely a good thing that this doesn’t have to be an overly technical post. I do enough technical writing for work producing test and deployment plans and supporting documentation and that tends to be a bit dry. Nice to be able to write free-form for a little bit.

Now, time for a change of technical pace and a discussion of the latest topics we’ve covered.]

Web Metrics
In going through the reading material for this portion of the latest module, I’ve (re)learned that even within the IT community each segment has its own language. Exit and bounce rates, conversions, visitor metrics, demographics, order values and campaigns: this is obviously where business and sales has influenced IT. Brand new perspective for me. I’ve provided technical expertise in pre-sales, writing proposals and statements of work and executing contracts, but never had to work directly in this area. Fortunately, the concepts are easy to comprehend. There are endless examples on the web to look at in this context; virtually any company trying to get visitors to buy goods, download content or fill out information. Just like brick and mortar companies, it’s all about numbers and analysis: who is visiting, why are they visiting, what are they looking for, when are they visiting and where did they come from? Once there is an understanding of the answers, actions can be taken in terms of marketing; who to target, how to get their attention (and business) and when is the best time to do so.

Google Analytics
This portion of this latest module has been one of the more enlightening, dare I say fun, topics so far. Not completely sure why. Maybe it was because I was able to see all the previously taught material put together in a usable, coherent package. Maybe it’s that I've been allowed to view real data from a live system and slice, dice and drill down to my own interactions (I’m pretty confident that the site visit in January from Afghanistan was me and I actually captured my real-time activity, see image below). I was unable to convince any of my local businesses at home to let me access their Google Analytics data for my homework, but I (along with other students from the class) was granted permission to look at the MISonline data. I was initially unhappy with this as I wanted to be able to provide some sort of benefit to a company that I do business with, but in retrospect, looking at MISonline has been more beneficial to me. Being able to view how other people have interacted with the site and how my own interactions affect the cumulative data is much more educational.


Anyway, I think I have rambled enough about school. Although, I will point out that the next module covering social network analysis could be very interesting. I am presuming that mathematical modeling is somehow involved. We’ll see…

Saturday, February 8, 2014

Did you score? Is it Normal or Not?

In the last two weeks, we've covered balanced scorecards and a decent intro to BI/DW.

Balanced scorecards just make sense to me. Having worked for large enterprises pretty much my entire IT career, trying to judge whether an organization is succeeding or not based on just one facet, specifically the bottom-line, is very narrow minded. Investments in future technology, new processes, "self"-analysis (company perspective) and the people behind them is what has made good companies great. The goal of a company (or at least it should be) is to grow, maintain profitability and keep a "healthy" workforce that is constantly contributing back to the company. Figuring how this can be measured, taken advantage of and improved is the key. This the "balanced" part: looking at a company from essentially a holistic perspective, and not from a pedestal through rose-colored glasses at the quarterly financials.

I know that there is often hesitation on the part of management to "take it all in." Whether it's looking in the mirror so speak and seeing how the internal workings are really working (or not in many cases), how others really view the company (like the paying customers who may not be as loyal as management perceives) and what is happening behind those closed doors where all the R&D funding goes. But it should be done (I'm not going to go on to some diatribe rationalizing why, there are plenty of books, articles, etc. that do a much better job than me).

My background is heavy on the technical and process aspects of IT (requirements, design, development, test, architecture, analysis, implementation, etc.); I have never really been involved with the financial aspects of the companies I've worked with or for. But... from a hands-on, anecdotal perspective, I can attest to the value of the internal creativity and innovation piece of the balanced scorecard. I witnessed it regularly (and even participated).

The moments of "outlandish" ideas borne on Friday afternoons while consuming MAPI beers (it's an Exchange thing :) sometimes weren't just local pilsner & IPA induced ramblings, they were real ideas. Ideas that were still valid come Monday morning. We (Microsoftees) were given fairly free reign to chase down these ideas and run them to the ground. And sometimes, they weren't too far off-base from where the company wanted to be. And if the ideas made sense and had future value, we, the brain childs of the ideas would present them, not some manager. And with each approval up through the reporting structure, you got closer to the "big one", the scary, yet fulfilling "Bill G Review" (back in the good'ole days when he was running the joint). His nod could put you and your newly anointed position and project on a whole other level. Now, not every idea would make it that far; some were already in progress elsewhere, some were "killed" early, etc.. The thing was, everyone knew that even the most junior, lowest node in the reporting chain could have an impact, and a major one at that.

My friends worked in R&D and had a show-and-tell in 1999. They said there wasn't going to be a lot of hoopla and fanfare, but to come on over anyway. I would think it was cool. What they showed was hard to fathom at the time and nearly impossible to display, literally, given the paucity of the hardware at the time: a PC workspace that presented everything in high-definition 3-D using on-screen touch controls. I think the code-name was Neptune or something like that. It wasn't feasible at the time for consumer use, but it had potential and would eventually influence things like Aero and the Windows interfaces since then as well as the now prevalent touch-screen interfaces on all Windows platforms.

So did Microsoft score? Repeatedly. Was the investment in R&D and the average employee worth it even though there was no immediate ROI? You bet. Is Microsoft the only company to realize this? Not a chance. Just look at the competition in the IT market alone: Google, Apple, etc. They all "get it" in some way or another.

I could go on about how programmatically the balanced scorecard concept is presented up and down the management chain, needs buy-in from all business units, must be part of the corporate culture, is an iterative/refining process, etc. but I won't. The fact is, this mirrors how companies should be addressing their information security and governance models as well as any other programs that are "culture changing." The finer details and implementations may be different, but their overall approaches are pretty much the same if they are to be successful.

Now, what I get, but have trouble reconciling is this not so normal view of the world that's been presented recently. Ok, to be specific "denormalized tables." I have spent a lot of time working with databases to ensure they are 3NF and in many cases 4NF, and this whole data warehousing stuff with facts, dimensions, star schemas, snowflakes, blizzards, surrogate keys, step keys, etc. has thrown me off kilter (yes, some of those are made up :).

The idea of adding redundant data in to a table just seems wrong. But, I've never had to produce the type of data that a DW spits out and I guess it's gonna be ok. The thing is, figuring out the balance between too much and too little. Is a star schema sufficient or should it be a snowflake, should there be any normalized tables or strictly dimensions with all redundant information. This is where I am as I attempt to noodle through the design for my current homework assignment. I guess it's time to just head to the whiteboard and work it out.

On that, I'm out for the night. Well, not exactly. Head in at midnight for scheduled IT outages across Afghanistan for a couple of hours. Gonna be a longer night than normal...

(Wow this turned out to be longer than I expected. Need to work on short posts.)

Sunday, January 26, 2014

Get this going...

So, the first week of MIS 587, Business Intelligence, is coming to a close.

Quite a bit of introductory/familiarization material and a lot of thinking; more than the other grad classes I've taken so far. And the adjectives that first come to mind are: overwhelming, interesting, mind-boggling, thought-provoking and scary.

So many disparate thoughts that I can't even decide where to start... Is it the sheer volume of data that is being generated and shared and increasing at an alarming rate, the quandary of where the line is drawn between "public" and "private" information, defining who actually owns the tidbits of seemingly useless "personal" data scattered everywhere, or figuring out whether all that data can be used for the betterment of humanity or nefarious purposes. And then there's, "what's next?"

With all this data, what is beyond targeted marketing, purchase trend analysis and understanding how a cold propagates throughout a region and eventually stops? Are we getting near the point of being able to reliably predict whole population behaviors based on conglomerated individual actions? Was the underlying "science" and premise of Asimov's Foundation Series really far-out science-fiction or simply real, undeveloped concepts that we are only now able to understand, quantify and use?

"Big data" analysis and application use almost seems limitless. I hope that this class sheds light on it in a meaningful manner useful in my future; seems that way so far...

It's after midnight (here in Kabul), time to head back to my quarters and get some sleep.