Talks & Tutorials Schedule
Talks

 Keynote: f(x) = a + bi  

Your (x) Future, f, Depends on Atoms (Real) and Bits (Imaginary: Cyber)

Mitsunobu Koshiba, President and CEO, JSR 

The talk highlights the technical and economical issues of the current AI technologies; namely, extremely high computation costs and the massive energy consumption of von Neumann computers. To overcome those two issues, many innovative approaches are being made to develop non-von Neumann computing technologies such as quantum computing and neuromorphic devices, which will lead us to the era of “cognitive computing.” This is what happens in the cyberspace.

 

In parallel, real technologies are striving for disruptive innovation. Disruptive technologies such as 3D printing, 5G, and satellite communication technologies in real space will be introduced. Those technologies complement the innovation in bit (cyber) space and will lead us to, perhaps, the next industrial revolution.

 

I would like to conclude my talk with my expectations to young SciPy audiences with lots of bit knowledge to pay attention to atoms (real) so that they can drive our future with the excellent blend and integration of real science/engineering and IT technologies.

Presented in Japanese. Simultaneous translation to English.

Re-run, Repeat, Reproduce, Reuse, Replicate: Transforming Code into Scientific Contributions

Fabien Benureau, Okinawa IST

Nicolas Rougier, INRIA

 

Throughout the evolution of a small random walk example implemented in Python, we'll illustrate some of the issues that may plague scientific code. The code may be correct and of good quality, but still many problems may reduce its contribution to scientific knowledge. To make these problems explicit, we'll articulate five characteristics that a code should possess to be a useful part of a scientific publication: it should be re-runnable, repeatable, reproducible, reusable and replicable.

Presented in English. Simultaneous translation to Japanese.

TFX: Production ML Pipelines with TensorFlow 

Robert Crowe, Google

Putting together an ML production pipeline for training, deploying, and maintaining ML and deep learning applications is much more than just training a model. Google has taken years of experience in developing production ML pipelines and offered the open source community TensorFlow Extended (TFX), an open source version of tools and libraries that Google uses internally.

Presented in English. Simultaneous translation to Japanese.

Evolution of the Enthought Platform 

Mark Dickinson, Enthought/Python Core Developer

Bio:  Mark is a member of the core Python development team with expert emphasis on Python’s numeric code. He has held teaching and research positions at the University of Michigan, the University of Pittsburgh, and the National University of Ireland, Galway. Mark holds a Ph.D. in pure mathematics from Harvard University and a B.A. in pure and applied mathematics from the University of Cambridge.

Presented in English. Simultaneous translation to Japanese.

Let's Enjoy the Python World Using Network Analysis ~ Overlooking the Reference Relationship of PEPs with NetworkX          

Tomoko Furuki

This talk introduce discovery about reference relation between PEP (Python Enhancement Proposals) obtained using NetworkX. 

 

Network analysis is an approach used to explore the relationship structure between "something" and "something" (e.g. friendship, network,  citation network).  By focusing not only on individual elements but also on relationships, you may get new insight. If you are a Python user, NetworkX will help you get started with network analysis.

 

This talk will also introduce a website that interactively visualizes PEP's reference relationships (https://github.com/komo-fr/pep_map_site).

Presented in Japanese.

Next-Level Art: Becoming More Creative with AI

Max Frenzel, Qosmo

There has been tremendous progress in the development of AI over the past few years. Even art and creativity, fields that are by many considered as distinctly human, have not been left unaffected. While many either fear that AI will replace or substitute humans, or argue that an AI can never be creative and that anything generated by AI is by definition not art, I want to present an alternative view. I believe that advanced AI will allow us humans to focus on what makes us truly human, and provide us with new tools for creative exploration. I will present NeuralFunk, an experiment in using deep learning for sound design. NeuralFunk is an experimental track entirely made from samples that were synthesized by neural networks. It is not music made by AI, but music made using AI as a tool for exploring new ways of creative expression.

Presented in English. Simultaneous translation to Japanese.

daskperiment

Masaaki Horikoshi, ARISE Analytics

Data analysis, including machine learning, requires a lot of trial and error. The results are dependent on hyperparameters, codes, packages, etc., so you may not be able to reproduce past results. This talk will show you what you need to do to make your machine learning experiment reproducible and the tools for that. The core tool, ' daskperiment ', is intuitive, regardless of machine learning algorithms or packages, and tracks the information you need. Internally, the Dask mechanism is used to efficiently perform the steps in the experiment. Users can have this package and make their own experiments reproducible.

Presented in Japanese.

SEM Image Noise Reduction           

Shinji Kobayashi, Tokyo Electron Kyushu LTD

With the latest electronic devices, semiconductor features are very small, and it is very important to control the pattern size and variation. Scanning electron microscopes (SEMs) generate images of these patterns by detecting secondary electrons emitted from incident electrons. These images have noise that must be removed to properly measure pattern variation. Typical filtering methods, such as a box or Gaussian filter, introduce undesired effects into the statistical analysis. The authors have developed a way to reduce image noise without filtering. In this presentation, we will discuss about how to remove image noise by generating artificial SEM images.

Presented in Japanese.

Chainer: A Deep Learning Framework for Fast Research and Applications

Crissman Loomis, Preferred Networks

Chainer is a deep learning framework for flexible and intuitive coding of high performance experiments and applications. It is designed to maximize the trial-and-error speed with its Define-by-Run paradigm, which provides Pythonic programming of auto-differentiated neural networks. The framework can accelerate performance with multiple GPUs in distributed environments and add-on packages enable quickly jumping into specific domains. In this talk, we introduce the abstract of Chainer’s API, its capabilities for accelerating the deep learning research and applications, and the future direction of the framework development.

Presented in English. Simultaneous translation to Japanese.

Optuna A Define-by-Run Hyperparameter Optimization Framework

Takuya Akiba, Shotaro Sano, Toshihiko Yanase, Takeru Ohta and Masanori Koyama, Preferred Networks

 

In this talk, we introduce Optuna, a next-generation hyperparameter optimization framework with new design-criteria: (1) define-by-run API that allows users to concisely construct dynamic, nested, or conditional search spaces, (2) efficient implementation of both sampling and early stopping strategies, and (3) easy-to-setup, versatile architecture that can be deployed for various purposes, ranging from scalable distributed computing to lightweight experiment conducted in a local laptop machine. Our software is available under the MIT license (https://github.com/pfnet/optuna/).

Presented in English. Simultaneous translation to Japanese.

Progression of Zero Programming Skilled Scientist to Python User.

Takayuki Miki, Tokyo Institute of Technology

Presented in Japanese. 

RAPIDS: GPU Accelerated Data Science

Akira Naruse, NVIDIA

RAPIDS has a mission to build a ridiculously fast, easy to use, open source platform that allows data scientist to explore data, train machine learning algorithms, and build applications while primarily staying in GPU memory. Our goal is that by using RAPIDS to keep tasks on the GPU, data scientists will see runtime speedups from hardware optimizations and productivity speedups from the elimination of glue code. This talk covers where the library is today, what has been done with it and where it’s going, how to get started, and how you can contribute to the GPU accelerated movement.

Presented in Japanese.

Reproducibility and Deployment of Scientific Code: A Discussion about the SciPy Stack and How EDM Helps           

Didrik Pinte, Enthought

Bio: Didrik is an expert in artificial intelligence, data management, and software development. He honed his leadership by running his own company, providing data management solutions in the environmental sector. Prior, Didrik served as a research assistant at Catholic University of Louvain (UCL) in Belgium, developing Python-based integrated water resource management applications. A proponent of open source software development, Didrik is on the board of NumFocus, and is an organizer and speaker at EuroSciPy. He holds an M.S. in agricultural engineering and an M.S. in management from UCL.

Presented in English. Simultaneous translation to Japanese..

Scaling Your Python Interactive Applications with Jupyter

Luciano Resende, IBM CODAIT/Jupyter Contributor

Jupyter Notebooks have become the "de facto" platform used by scientists and engineers to build Python interactive applications to tackle scientific and machine learning problems. However, with the popularity of big data analytics and complex deep learning workloads, there is a growing requirement to extend the computation across a cluster of computers in a parallel fashion. In this talk, we will describe how to use multiple Jupyter Notebook components to enable the orchestration and distribution of interactive machine learning and deep learning workloads across different types of computing clusters including Apache Spark and Kubernetes. This talk is intended to attendees interested in distributed platforms and scientists experiencing difficulties on scaling their scientific workloads across multiple machines.

Bio: Luciano Resende is an STSM and Open Source Data Science/AI Platform Architect at IBM CODAIT (formerly Spark Technology Center). He has been contributing to open source at The ASF for over 10 years, he is a member of ASF and is currently contributing to various big data related Apache projects around the Apache Spark ecosystem. Currently, Luciano is contributing to Jupyter Ecosystem projects building scalable, secure and flexible Enterprise Data Science platform.

Presented in English. Simultaneous translation to Japanese.

Apache Arrow - A Cross-language Development Platform for In-memory Data

Kouhei Sutou, ClearCode Inc.

 

Apache Arrow is the feature for data processing systems. This talk describes how to solve data sharing overhead in data processing system such as Spark and PySpark. This talk also describes how to accelerate computation against your large data by Apache Arrow.

Presented in Japanese.

CuPy: A NumPy-compatible Library for High Performance Computing with GPU

Masayuki Takagi, Preferred Networks

 

CuPy is an open-source library which has NumPy-compatible API and brings high performance in N-dimensional array computation with utilizing Nvidia GPU. Its API is to designed to provide high compatibility with NumPy so that in most cases you can gain several times speed improvement from drop-in replacement to your code. CuPy is actively developed and is continuously well-maintained, resulting in 2,700+ GitHub stars and 13,000+ commits. CuPy was also presented at PyCon 2018

Presented in English. Simultaneous translation to Japanese.

Tutorials

Advanced Machine Learning 

Alexandre Chabot-Leclerc, Enthought 

Scikit-learn is a powerful machine learning library in Python. In this 3.5h tutorials, we will cover some advanced topics of the library, such as pipelines, grid search, and cross validation to make reproducible analyses. We will also discuss feature selection to reduce computation time and prevent overfitting. Finally, we will use some of the scikit-learn built-in functionality to work with text data.

 

Prerequisites: This tutorial assumes prior experience with the scikit-learn API. We will review it at the beginning of the tutorial to make sure everyone is on the same page. It also assumes that the participants are comfortable using NumPy, Pandas, and matplotlib. Some knowledge of Seaborn is also useful, but not essential.

Presented in English. Simultaneous translation to Japanese.

Alexandre Chabot-Leclerc is a Python trainer and developer at Enthought. He holds a Ph.D. in Electrical Engineering from the Technical University of Denmark. His graduate research was in the field of hearing research, where he developed models of human speech perception. Alexandre's interests include teaching, psychoacoustics, and rock climbing.

Advanced NumPy 

Juan Nunez-Iglesias, Monash University

A hands on tutorial covering broadcasting rules, strides / stride tricks and advanced indexing.

Prerequisites: Comfortable with Python syntax, and some familiarity with NumPy / array computing.

Presented in English. Simultaneous translation to Japanese.

 

Bio: Juan Nunez-Iglesias is a Research Fellow and CZI Imaging Software Fellow at Monash University in Melbourne, Australia. He is a core developer of scikit-image and has taught scientific Python at SciPy, EuroSciPy, the G-Node Summer School, and at other workshops. He is the co-author of the O'Reilly title "Elegant SciPy".

Tensorflow 

Josh Gordon, Google

A hands-on introduction to TensorFlow 2.0. In this 3.5 hour tutorial, we will briefly introduce TensorFlow, then dive in to training neural networks. This tutorial is targeted at folks new to TensorFlow, and/or Deep Learning. Our goal is to help attendees get started efficiently and effectively, so they can continue learning on your own. Attendees will need a laptop with an internet connection, there is nothing to install in advance.

Prerequisites: Prior machine learning experience is not assumed. We will do our best to introduce relevant concepts as needed. The goal of our tutorial is not to teach you everything you need to know, but to get attendees started and overcome any initial barriers, so they can continue learning on their own.

 

Presented in English. Simultaneous translation to Japanese.

 

Bio: Josh Gordon works on the TensorFlow team at Google, and teaches Applied Deep Learning at Columbia University. He has over a decade of machine learning experience to share. You can find him on Twitter at https://twitter.com/random_forests.

Intro to Visualization

Manabu Terada, PythonED

Introduction to Visualizations in Python using Jupyter Notebooks.

 

Presented in Japanese.