domingo, 27 de septiembre de 2015

The new world of analytics and BI. From a text file to the iPad in a couple of hours.



The world  has changed in many ways in the last 5 years. It has changed in  analytics and business intelligence too.
A friend of mine is running a web proxy, a program that monitors which web pages are visited and by whom from inside his company. This proxy generates a log file in text format which contains the username, the page visited, the amount of bytes, etc. The file looks like this:
192.168.15.63 user2 [15/Sep/2015:09:10:09 -0400] "GET http://api.new.livestream.com/accounts/8665913/events/4253600/broadcasts/99302610/availability HTTP/1.1" 304 342 TCP_MISS HIER_DIRECT
192.168.15.150 user1 [15/Sep/2015:09:10:16 -0400] "CONNECT ci3.googleusercontent.com:443 HTTP/1.1" 200 5917 TCP_MISS HIER_DIRECT
192.168.15.150 user1 [15/Sep/2015:09:10:16 -0400] "CONNECT ci5.googleusercontent.com:443 HTTP/1.1" 200 11657 TCP_MISS HIER_DIRECT
192.168.15.150 user1 [15/Sep/2015:09:10:16 -0400] "CONNECT ci6.googleusercontent.com:443 HTTP/1.1" 200 17674 TCP_MISS HIER_DIRECT
192.168.15.63 user2 [15/Sep/2015:09:10:25 -0400] "POST http://livestream.com/analytics/api/track HTTP/1.1" 204 484 TCP_MISS HIER_DIRECT
Based on the contents of the log file, my friend wants to know which are the top 10 internet users, the top 10 visited sites, or the peak hours of daily internet use. And he asks me to process the file and get this information for him. There are many ways to achieve it, but the one I tested is the following:
I loaded the log file to a Hadoop-based platform (Microsoft Azure HDinsight) and gave it a queryable structure. For achieving this I had to write only two lines of  code.
Now I have the file loaded and I want to give my friend a familiar tool to query the logged data. It could be Excel (there are ODBC drivers for Excel to connect to HDinsight), but why not something more mobile? For example Microsoft PowerBI, it has has the ability to connect to HDInsight and you can run PowerBI in your iPad.
Now, I have a log file from a web proxy loaded into a Hadoop-based tool, and my friend can query its contents from his iPad, amazing isn't it? 
I used technologies like cloud and big data for a trivial task like this, and I did it without installing any tools locally. My point here is: The way we process and make information available has changed a lot in the last 5 years, and we (IT professionals) have to adapt to those changes, and more importantly, we must take advantage of them!

domingo, 8 de febrero de 2015

After four decades SAP ERP (now S/4 HANA) is reborn! Literally!

A lot of the biggest companies in the world have been running SAP for decades, maybe not with incredible performance but always with an "acceptable" performance.

What was the secret behind this "acceptable" performance? It was redundancy; basically you bend database normalisation rules and store some aggregated data for the sake of usability and performance.

Imagine if SAP did not use the redundant data (totals tables). When checking the balance of a ledger account you would need to wait (for minutes) because the system would need to read all related transactions first; thanks to the totals tables it only needs to read one record. But the drawback is that every time you save a new transaction, the system needs to update multiple totals tables.

The point here is: If you need multiple views of your data like by account, by vendor, by country, etc. you can create more and more of those totals tables, but then, every time you save a transaction the system will update all the totals tables you've created leading to slower transactions.

To further address the performance issue, satellite systems were born. Like BW (Business warehouse) that stores an aggregated copy of each transaction that happens in the main system. This way you can make ad-hoc reports on the duplicated data with an "acceptable" response time. Other satellite systems are: CRM, APO, etc.

Both, the use of totals tables and the introduction of satellite systems, successfully ensure performance by replicating main system transactions in multiple locations. This inevitably creates databases that grow huge and become hard to maintain in sync.

However, with the economic globalisation and phenomenas like the internet of things, too many transactions are being generated and systems need to keep up the pace. It seems that saving and updating the same data in multiple locations is not feasible anymore, or at least not optimal.

Eight years ago at the HPI, one of the SAP board members started a revolution, he said "Let's redesign enterprise systems with the following premise: A database that has 0 seconds response time". A new vision and SAP HANA were born.

On 3.Feb.2015 SAP S/4 HANA was launched as the result of that vision from eight years ago.  This new version of the SAP ERP and it's satellite systems don't need any of the old tricks for performance, no more totals tables, no more saving the same transactions in multiple locations. Everything is SIMPLE, that's what the S in the name stands for, and the 4 is because it's a 4th generation system.

This was an incredible task to achieve, since SAP had to rewrite almost all of its 400 millions of lines of code. As a result,  for example a system that has  593 GB database  can now fit in only 8GB, yes the storage space you probably have in your mobile device.

Are you ready for a new world of enterprise software?

If you are interested in more details watch the launch event video here.

Or read this https://blogs.saphana.com/2015/01/14/simple-finance-removes-redundancy-case-materialized-aggregates/.







domingo, 1 de febrero de 2015

Hot start for BI tools in 2015! Microsoft PowerBI and SAP Lumira Edge are here!

Just when the first month of 2015 was about to end, two big BI news appeared on the Internet:

Microsoft PowerBI and SAP Lumira EDGE are now available; let me explain why this is big news for me.

I've been working with BI (for the last 11+ years) specifically with SAP BW / BO. I think that for corporations SAP BW / BO is a spectacular tool, but what about small companies, or small units inside corporations?

In my opinion there was no All-in-one (ETL, Presentation, authorisations, scheduling, etc.) BI tool for small companies. This is no longer the case after the release of PowerBI and the use of Office 365. Basically you can use it to:

Extract data from:
- Sources in the cloud (SalesForce, Access Apps, etc.).
- Sources in your office (SQL server databases, Excel/Text files, etc.)

Create nice static printer friendly reports or stunning Dashboards and publish them on your Office 365 Portal. Also you can schedule them to be refreshed automatically. Then anyone with access can use these reports from a PC or from a tablet.

In less than 4 hours I managed to do the following: Install a piece of software that allowed PowerBI to connect to an SQL database in my PC enabling an automatic refresh of the BI model from my local data. Publish two reports to the Office 365 portal that can be accessed from a PC or a tablet: Wow!

Easy and powerful BI is available now for small companies. Off course there are still some rough edges since it is so new, for example the error messages that apear when using other languages than English are totally cryptical, or the screenshots from the help do not match the actual screens.

Inside corporations there are highly specialized teams that could really benefit from an In-Memoy BI solution like Lumira EDGE, which is easy to install a deploy. I've not tested it yet, but it looks promising.

Now January 2015 is gone, but we have two new interesting options for BI.