
2015년 3월 4일 수요일

Strata+Hadoop WORLD PyData 후기


PyData at Strata

  • 머신러닝 라이브러리는 scikit-learn 활용
  • 시계열성 시각화는 Bokeh
  • 성능상 이슈가 있다면 Numba
  • 데이터 로딩을 위해서는 Pandas
  • 모든 작업은 iPython Notebook 으로..
  • ipynb 공유를 위해서는 Jupyter로..


Python has become an increasingly important part of the data engineer and analytic tool landscape. Pydata at Strata provides in-depth coverage of the tools and techniques gaining traction with the data audience, including iPython Notebook, NumPy/matplotlib for visualization, SciPy, scikit-learn, and how to scale Python performance, including how to handle large, distributed data sets. Come see how the leading lights in the Python data community are making Python ever more useful to data analysts and data engineers.
9:00am – 10:30am
Track 1 (room LL21 B):
  • Machine Learning with scikit-learn
    Andreas Mueller
scikit-learn has emerged as one of the most popular open source machine learning toolkits,
now widely used in academia and industry. scikit-learn provides easy-to-use interfaces to perform advanced analysis and build powerful predictive models. The tutorial will cover basic concepts of machine learning, such as supervised and unsupervised learning, cross validation and model selection. We will see how to prepare data for machine learning, and go from applying a single algorithm to building a machine learning pipeline.
11:00am – 12:30pm
Track 1 (room LL21 B):
  • Interactive Web Graphics with Bokeh
    Peter Wang
Bokeh is an open-source library for building web graphics, ranging from simple interactive plots to complex dashboards with streaming data sources.  This is tutorial will quickly introduce some of the basic concepts behind Bokeh and then dive into a step-by-step series of exercises which showcase how to embed interactive graphics in an IPython notebook and build more complex linked graphics.  Streaming and large datasets will also be demonstrated.
1:30pm – 3:00pm
Track 1 (room LL21 B):
  • Intro to Numba and Performance Python
    Travis Oliphant
Numba is a just-in-time compiler for Python that can translate a wide range
of Python functions into high performance machine code at runtime. This
tutorial will give an overview of the capabilities of the Numba compiler and
walk through several examples showing how to use Numba to generate fast 
implementations of numerical algorithms from pure Python. We will briefly
touch on more advanced features of Numba, such as compiling for the GPU, at
the end.
A basic installation of Anaconda. Example IPython notebooks will be posted to
GitHub before the tutorial.
3:30pm – 5:00pm
Track 1 (room LL21 B):
  • Analytics Beyond the Basics with pandas and SQL
    Wes McKinney
In this tutorial, we’ll take a tour through a variety of useful, but sometimes tricky analytical tasks and show how they can be tackled with pandas or SQL. A part of the goal is to illustrate how SQL concepts map onto the pandas API and vice versa, and for the participant to learn more about advanced usage of each of the tools.
Materials will be posted at http://github.com/wesm/strata-sj-2015

댓글 없음 :

댓글 쓰기