Vida online : 2018

domingo, 23 de diciembre de 2018

2019 el año de la DWEB

La WEB es un lugar para hacer de todo un poco. La vida online puede ser muy diversa y emocionante. La mayoría de nuestras aventuras online ocurren en los dominios de corporaciones (Google, Facebook, etc.) que registran nuestras acciones y las explotan para generar ingresos en base a publicidad dirigida. Algunos innovadores están construyendo plataformas fuera del alcance de las grandes corporaciones. Todavía podemos comprar, tener romances, etc. La diferencia es que tus datos se mantienen bajo tu propio control y permanecen cifrados. La DWEB (Distributed WEB) intenta crear una web con más privacidad y menos enfocada en la monetización de tu información.

LaWEB como existe hoy

La DWEB

La DWEB es todavía joven y con algunos fallos, pero la tecnología ha evolucionado lo suficiente para ofrecer servicios funcionales. ¡Únanse!

Graphite Docs - Similar a Google Docs, pero en vez de entregar tus documentos a un gigante de la publicidad, tus documentos son cifrados con una llave que solo tú tienes.

https://app.graphitedocs.com/

Textile - Funciona como cuenta de Instagram privada, tus fotos compartibles son cifradas y enviadas a un almacenamiento descentralizado mantenido principalmente por voluntarios. Tienen versiones para IOS y Android

https://www.textile.photos/

Blockstack - La DWEB necesita nuevas herramientas de desarrollo.

https://blockstack.org/

D.Tube - Las estrellas de estos videos descentralizados no buscan dinero de la publicidad.

https://d.tube/

lunes, 20 de agosto de 2018

Composite Models: Live data (from SAP HANA) and load data from Excel in one single model using Power BI

Hi,

On July 2018, Microsoft announced the availability of "Storage Mode" (still in Preview mode, not ready for Production) in Power BI, giving you the chance to decide for each table if the table will be:

- Import
- Direct Query (Live)
- Dual

This is a great option, especially, if you think that for some models:

a) You do not want to Import (copy) the whole dataset to your model because it is is too big or will add latency.

b) You need to combine Direct query (Live) data with other data (before July 2018 this was not possible, when you were using a direct query, there was no possibility to add load data)

A possible scenario could be; you have your actuals coming from a HANA System (BW, S/4, etc.) which usually are millions of records, you do not want to load this to Power BI. (Point a) ). For the actuals, you use a live (direct query) connection to HANA, but for your projections, you get the data from Excel. (Load Mode).

The schema will look:

1) Trough the direct query (live) connection you are getting the Actuals and Plan. The table is called "Hana_Live" and the storage mode is "Direct Query" as can be seen here:

2) From the Excel file, we loaded the projections. The table is called "Projection_Excel" and the storage mode is "Import".

3) And the last step is to combine this two tables to be able to create the report, because of the model we need to make a union; and yes it's possible to make a union with Import and direct query tables, with the standard expression: UnionCombined = union(HANA_LIve,Proyection_Excel)

Another way of showing this is:

As you can see this is a great new feature that enables a lot of new modeling options, hopefully, this becomes generally available soon.

Any experiences with this?

I also found interesting that here:
https://docs.microsoft.com/en-us/power-bi/desktop-storage-mode
It says that:

The following multidimensional sources cannot be used with composite models:

SAP HANA
SAP Business Warehouse
SQL Server Analysis Services
Power BI datasets

but enabling the option treat HANA as the relational source allows the possibility of using HANA in composite models.

jueves, 26 de julio de 2018

Power BI connector to BW (BEx queries), version 2.0 - First Impressions

Introduction

In June-2018 Microsoft released the connector for SAP BW version 2.0. You can read about the announcement here.

As you can see there one of the new features is:
"Ability to retrieve several million rows of data, and fine-tuning through the batch size parameter."

I decided to give it a try.

Keep in mind that SAP says: "Bex queries are not made for mass data extraction." as stated on SAP note: 1968598 - Mass data in Bex queries Options and Workarounds.

Also, note that you may need to fine-tune some memory parameters in your application server to avoid memory problems, which at some point will anyways occur since BEx queries are not meant for mass data extraction.

The configuration

The source system

SAP BW/4 HANA SP5 deployed from CAL.SAP.COM to Google cloud platform. The dataset consists of 2.4 million records, with one characteristic or dimension that is a unique ID.

This unique ID it is generated by ABAP code in the transformation and is the key of the DSO. This happens in real life scenarios, having a dimension with millions of different values, examples are billing document numbers, accounting document numbers, etc.

I loaded the data to two targets, both advanced DSOs, one where the unique key (the dimension with 14 million different values) was an info object (which means SID generation) and another where the unique key was a HANA field (no SID generation).

On top of those advanced DSOs, I created queries, and Power BI connects to those queries.

The data I loaded is from here.

The Target

I was using Power BI Desktop 64-bits 2.59.5135.781 64-bit (June 2018)

Note: Since this new connector is in BETA, I did not test it using the Power BI gateway, which is a must (in most cases) for production use.

The first parameter tested - Batch Size

Image 1 - The new connector screen, showing all available parameters (highlighted in the Batch size parameter)

There is not much documentation by Microsoft about execution mode, but in transaction MDXTEST if you press "Execute..." you can find the same parameters, and from there you can do some research.

Image 2 - Correlation between PowerBI parameters and MDXTEST parameters

My intuition for this parameter was a follow:

For a smaller batch size, more RFC calls, and less memory consumption, for bigger batch size, fewer RFC calls, and more memory consumption.

With batch size 50.000

Each batch was around 50k records, which is just a coincidence. Depending on the width of your dataset this can vary.

Here is a short video (30 seconds) where you can see the load:

And here is a screenshot from transaction ST04:

In 1 we can see the memory consumption, in 3 (R=30) we can see that there have been 30 RFC calls (30 x 5000 = 1 500 000 records at the moment of the screenshot).

With batch size 100

Each batch was around 600 records. Here is a short video (30 seconds) where you can see the load:

And here is a screenshot from transaction ST04:

n 1 we can see the memory consumption strangely same as in the previous case, in 3 (R=925) we can see that there have been 30 RFC calls (925 x 600 = 555000 records at the moment of the screenshot).

This load was noticeably slower.

With batch size 1 000 000

With this batch size, the load ended with an out of memory (TSV_TNEW_PAGE_ALLOC_FAILED) error in BW.

You can see the video here:

This load was noticeable faster but did not finish successfully.

Conclusion

Start with a small batch size and increment, to get better performance. You will know it's too big when it fails. Always experiment in a SandBox system, never in production!

This parameter is a great addition to the connector!

In a new blog I will test the other parameters!

Appendix A

The memory parameters of the application server (Screenshot from ST02)