Open Source Summit Europe 2018

Edinburgh International Conference Centre

The Future of AI is Data…In More Ways than You Think

Eric Berlow, Co-Founder, Chief Science Officer, Vibrant Data Inc.

Tim Berners-Lee

Personal data stored distributed

The What-If Tool: Code-Free Probing of Machine Learning Models

Google’s What If Tool

OpenMappr, explore complex networks

OpenMappr

Building an Open Source Software Culture at Microsoft

Stephen Walli, Principle Program Manager, Microsoft

“Culture eats strategy for breakfast”

Creating an IoT Data Layer for Collecting, Storing, Analyzing and Reacting to Data

David G. Simmons, @davidgsIoT, InfluxData

Distributed data collections

Data collected at multiple collection points
Remote collections feed back-end system of record
Distributes data collection load
More tolerant of network outages, etc.

Data Layer Architecture

Data collected at the edge, where it is generated
Edge collectors also capable of analysis
Edge collectors handle local event, etc.
Down-sample data forwarded to backend on a network-available basis
- Lower network costs
- More fault tolerant

IoT Data Layer

What is IoT Data?
- sensor@time - that’s time series data!
IoT data MUST be
- Timely - ingestion rates and query efficiency is key
- Accurate - data integrity and platform reliability is important
- Actionable - data visualization, anomaly detection & alerting are essential
IoT deployments are struglling to find efficient, scalable, data platform that meets all of these criteria

Apache Kafka - “A System Optimized for Writing”

Bernhard Hopfenmüller, ATIX AG

IRC: Fobhep, github.com/Fobhep

Interesting session giving insights into Kafka.

Kafka - Docker

Toro Kernel, A Dedicated Kernel for Microservices

Matias Vara Larsen, Silicon Gears & Cesar Bernardini, Barracuda

TORO

Toro is a simple kernel that allows microservices to run efficiently in VMs thus leveraging the strong isolation VMs provide.

What is it?

Toro is a simple kernel that provides a dedicated API to develop microservices. We propose two kinds of sockets to build microservices: blocking and non-blocking. Blocking sockets are good for intensive-IO microservices whereas non-blocking sockets are good for microservices that can serve a request without blocking. When a microservice executes in Toro, it runs alone in the system thus leveraging on the VM’s resources.

What is it?

A dedicated kernel for multi-threading applications.

How it works?

Toro is a set of libraries that compile within the user application, i.e., the microservice. The user can choose which components should be included, .e.g, drivers, filesystems, etc. This results in a binary that can run on top of modern hypervisors like KVM, Xen or VirtualBox. Once the kernel has been initialized, the microservice starts to execute. The microservice and the kernel execute at the most privileged level and share the memory space, i.e., flat memory model. In this sense, Toro only supports threads and does not use paging.

How it works

Summary

Toro is a kernel dedicated to run microservices
Toro provides a dedicated API to specify microservices
Toro design is improved in four main points:
- Booting time and building time
- communication to the kernel
- memory access
- networking

Toro wants you

Talked to César Bernardini (mesarpe@gmail.com) from Argentina.

Connected César to Alex Ellis.

Introduction to Natural Language Processing with Python

Barbara Fusinska, barbarafusinska.com, Google

KataCoda: NLP with Python

Reuters dataset

Reuters-21578 dataset
Documents assembled and indexed with categories
Appeared in the Reuters newswire and made public

Bag of words

Documents:

John likes to watch movies. Mary likes to watch movies too.
John also likes to watch football games.

Vocabulary:

[also, and, both, football, …]

Stemming

Reduce the words to their root form:

likes => like
movies => movie
watched => watch

Vocabulary:

[also, football, games, john, like, mary, … ]

Machine learning: Training & Validation

Python Natural Language Toolkit (NLTK)

Lexical Analysis (tokenizing)
Part of speech tagger
Namedentity recognition
Stemmers

“An amazing library to play with natural language”

scikit-learn: Machine Learning in Python

Classification, Regression, Clustering
Dimensionality reduction
Model selection
Preprocessing

Conclusions

Heavy on data and preparation and feature generation
Vocabulary requires proper design
Sparse vectoor representation
Discarding word order may lose context
Stop words may mislead the meaning
Word stemming may limit information

Cloud-init

Chad Smith & Scott Moser, Canonical

cloud-init.io
read the docs
IRC: Freenode #cloud-init, smoser, blackboxsw
Mailing list: cloud-init@lists.launchpad.net

Open Source Summit Europe 2018 - day 2

My personal notes ...