1. Scala for Data Science - by Pascal Bugnion (ASI Data Science)
- You can use Python and R in Data Science in new data science projects to start with. However when/if you need to scale them exponentially Scala is the way to go due to how it can handle concurrency (i.e. Java based)
- The 'problem' with Scala is that it does not have powerful visualization libraries - in contrast with Python or R. The solution to that is using a tool such as Plotly
- Plotly can be used to create and share visualizations online simply by sending your data in a JSON format, among other ways. It takes care of the rest for you.
- You can use plotly for graphs, dashboards and a number of charts and it also works with Python, R, Matlab and more.
2. Application Architecture for Big Data - Tom White, Head of Development at Method Digital, Prev CTO of Skin Analytics
- Differences between Data Scientists and Developers when working together on the same project
- Data Scientists
- Focus on meaningful results
- Exploration and experimentation
- Large datasets
- Preprocessing, model generation
- Lots of scripting
- Limited scope for effective code-reuse
- (Sometimes) little knowledge of how Software Engeneering works
- Developers
- Focus on stable, secure, rapid iteration
- Agile Development
- User Stories
- Git workflows
- Continuous Integration
- Code Reviews
- User Acceptance Testing
- DRY Coding
- (Sometimes) little knowledge of how Data Science works
- Antipattern in DS and Devs working together - Developers write 'all the code' - i.e. linking to too low level Data Science components which often change as experimentation continues
- Suggested approach
- Separate, co-owned app providing an API
- Only use the minimum data-science functions you need - 'freeze' them into the API
- Version the APO and maki it purely additive
- Version any datasets too
- Keep a 'live' version on top for tinkering in test environments if need be
3. Google Big Data Lifecycle












































