The world has changed in many ways in the last 5 years. It has changed in analytics and business intelligence too.
A friend of mine is running a web proxy, a program that monitors which web pages are visited and by whom from inside his company. This proxy generates a log file in text format which contains the username, the page visited, the amount of bytes, etc. The file looks like this:
192.168.15.63 user2 [15/Sep/2015:09:10:09 -0400] "GET http://api.new.livestream.com/accounts/8665913/events/4253600/broadcasts/99302610/availability HTTP/1.1" 304 342 TCP_MISS HIER_DIRECT
192.168.15.150 user1 [15/Sep/2015:09:10:16 -0400] "CONNECT ci3.googleusercontent.com:443 HTTP/1.1" 200 5917 TCP_MISS HIER_DIRECT
192.168.15.150 user1 [15/Sep/2015:09:10:16 -0400] "CONNECT ci5.googleusercontent.com:443 HTTP/1.1" 200 11657 TCP_MISS HIER_DIRECT
192.168.15.150 user1 [15/Sep/2015:09:10:16 -0400] "CONNECT ci6.googleusercontent.com:443 HTTP/1.1" 200 17674 TCP_MISS HIER_DIRECT
192.168.15.63 user2 [15/Sep/2015:09:10:25 -0400] "POST http://livestream.com/analytics/api/track HTTP/1.1" 204 484 TCP_MISS HIER_DIRECT
192.168.15.150 user1 [15/Sep/2015:09:10:16 -0400] "CONNECT ci3.googleusercontent.com:443 HTTP/1.1" 200 5917 TCP_MISS HIER_DIRECT
192.168.15.150 user1 [15/Sep/2015:09:10:16 -0400] "CONNECT ci5.googleusercontent.com:443 HTTP/1.1" 200 11657 TCP_MISS HIER_DIRECT
192.168.15.150 user1 [15/Sep/2015:09:10:16 -0400] "CONNECT ci6.googleusercontent.com:443 HTTP/1.1" 200 17674 TCP_MISS HIER_DIRECT
192.168.15.63 user2 [15/Sep/2015:09:10:25 -0400] "POST http://livestream.com/analytics/api/track HTTP/1.1" 204 484 TCP_MISS HIER_DIRECT
Based on the contents of the log file, my friend wants to know which are the top 10 internet users, the top 10 visited sites, or the peak hours of daily internet use. And he asks me to process the file and get this information for him. There are many ways to achieve it, but the one I tested is the following:
I loaded the log file to a Hadoop-based platform (Microsoft Azure HDinsight) and gave it a queryable structure. For achieving this I had to write only two lines of code.
Now I have the file loaded and I want to give my friend a familiar tool to query the logged data. It could be Excel (there are ODBC drivers for Excel to connect to HDinsight), but why not something more mobile? For example Microsoft PowerBI, it has has the ability to connect to HDInsight and you can run PowerBI in your iPad.
Now, I have a log file from a web proxy loaded into a Hadoop-based tool, and my friend can query its contents from his iPad, amazing isn't it?
I used technologies like cloud and big data for a trivial task like this, and I did it without installing any tools locally. My point here is: The way we process and make information available has changed a lot in the last 5 years, and we (IT professionals) have to adapt to those changes, and more importantly, we must take advantage of them!

No hay comentarios.:
Publicar un comentario