Sometimes you need to step away from reporting and analytics and do some data science. You might want to build a new lead scoring model or use a random forest to look for patterns in your churn, and languages such as R or Python offer the power and flexibility to get that done.
Sisense for Cloud Data Teams, with its easy data exploration, visualization, and cross-database analytics is an amazing tool to have available while you code.
- It’s the ideal location to build an input dataset for a model. All your data lives in one place, and cross-database joins allow you to quickly create datasets that contain everything you need.
- You can output your work directly to Sisense for visualization. Having an easy to use tool with advanced data visualization is great for exploring derived datasets or examining the results of a model.
- Model results can be put back into a database and go directly to consumers. For instance, a Python model can build a lead scoring model, and then the scores can be used in dashboards or queries for later analysis or segmentation.
The way you can use Sisense for Cloud Data Teams and Python varies depending on your level of service—more specifically on whether you use only the visualization platform, our cache or our full analytics warehouse. Here are a few different ways you can use Sisense for Cloud Data Teams and Python together—or R— but I’ll use Python as an example.
Connect to Your Own Database
The simplest way to use Python with Sisense for Cloud Data Teams is to connect to an existing database and interact with that database directly. Python allows for easy access to most database types through pyodbc, psycopg2, or other common libraries. You can easily pull data out and write it back to the database.
We’ll use Redshift as an example.
For our connection I will use the psycopg2 library. It can connect to Redshift quickly and easily. If you’ll be doing modeling, I further using the Pandas library for consuming and working with data while using Python. You can read your SQL directly into a dataframe and be modeling in minutes.
import psycopg2 import pandas as pd ## Connect to Redshift Database conn = psycopg2.connect( host="analytics.********.us-west-1.rds.amazonaws.com", user='user', port=5432, password=’password', dbname='analytics') cur = conn.cursor() ## Use cursor to create a table, or run other queries which interact directly with the database cur.execute(“create table new_data as select * from old_data limit 10”) conn.commit() ## Combine psycopg2 with pandas for data interaction and modeling df = pd.read_sql(query, conn) print df.head()
This does require you to maintain an independent analytics database, and doesn’t allow cross-database joins, but you can visualize your results in Sisense and consume your model results in dashboards immediately!
Use the Sisense for Cloud Data Teams CSV Endpoint
For cache users, Sisense for Cloud Data Teams allows the creation of CSV endpoints where a chart’s data can be retrieved over a simple http request. This is a great solution for simple work, and will enable you to use always fresh data from multiple data sources. If you need to work with a small dataset in Python this is a great way to go.
Your first step is to create the table you want to work with as a chart, and then enable public CSV access. This can be found in the advanced section of the series settings tab in the editor.
Once you have enabled access you, can point Python to the link using the urlib2 Python library, and read the data directly into Pandas:
import urllib2 import pandas as pd url = 'https://app.periscopedata.com/api/your-CSV-endpoint’' response = urllib2.urlopen(url) df = pd.read_csv(response) print df.head()
The CSV endpoint is built as a lightweight solution to grab data from a chart, and thus capped to the same 5 MB limit as it’s parent chart. With most tables this is between 5,000 and 25,000 rows—though if you are careful to avoid strings and exclude unnecessary data, you can get more.
Get Real with a Sisense for Cloud Data Teams Warehouse
If you want the most power and flexibility, Sisense for Cloud Data Teams also offers a full data warehouse. This means—in addition to a very fast, dedicated private cluster—you have full read and write access to all your Periscope SQL views.
Using Python similar to the first example, your Sisense views can act as input datasets for your models. You do all your work across any connected database, dataset and data size. This includes datasets in different databases, CSVs, and third-party integrations.
You can push your model outputs directly back into the Sisense for Cloud Data Teams warehouse where your results will be instantly useable across your company and can be recombined with all your data, creating a virtuous cycle. No new databases, no data caps, and no limitations. Your model output can be immediately visualized and shared.