StatLab Workshops Spring 2017

To these workshops in a calendar view, see

Geographic Information System (GIS) Workshops

Intro to GIS: Mapping with ArcGIS | This is an introduction to the basic concepts of creating, managing and analyzing explicitly spatial data within a Geographic Information Systems (GIS) framework. Included is a step-by-step, “hands on” introduction to using spatial data within ESRI’s ArcGIS software. Topics will include: Spatial Data Models, Spatial Relationships, The ArcMap User Interface, Thematic Mapping Using Symbology, and Simple Analysis Using Complex Selection Methods. (Registration: 1/27 | Registration: 4/5)

Geocoding | Geocoding is a geoprocessing technique that allows you to derive latitude and longitude coordinates from an address database. As a result, the original table of addresses can be mapped, enabling the power of geospatial analysis. This workshop will provide an overview of geocoding and the most important considerations in conducting your geocoding project. We will review several desktop and web-based geocoding options, such as ArcGISQGIS and Python.  After the workshop, researchers will have a good understanding of how to conduct geocoding with sensitive data and how to assess the accuracy of geocoded results. Note: Trainers will stay until 2 p.m. in case you would like to discuss your project or work on a hands-on geocoding exercise. (Registration: 2/2)

Intro to GIS: Mapping with QGIS | This workshop will introduce basic concepts of geographic information systems through the use of QGIS, a free and open-source GIS that can be used on Windows, Mac OS, and Linux platforms.  We will cover installation and adding plugins, projection and coordinate systems, types of spatial data, transforming tabular into spatial data, creating color-coded maps, and conducting basic spatial analysis to help you start with your research.(Registration: 2/17)

Intermediate GIS: ArcGIS | This workshop will build upon the Introduction to GIS workshop by introducing students to a variety of tools for spatial analysis. Students will use the ArcGIS software suite to load, manipulate, analyze, and visualize data. We will consider strategies for working with both vector and raster data. As time allows, topics to be covered include: coordinate systems and projections, geospatial data management, creating spatially explicit datasets (geocoding and georeferencing), measures of central tendency (spatial means, standard distances, proximity), estimating geographic distributions (interpolation, kernel density estimation), measuring geographic distributions (spatial autocorrelation and clustering/’hot spot analysis’), basic raster analysis (local, focal, and zonal functions), and non-Euclidean raster operations (impedance layers, cost distance, least cost paths, cost allocation). Prerequisites: Prior use of ArcGIS is required. (Registration3/10)

Intro to ModelBuilder: ArcGIS | ArcGIS ModelBuilder is a visual programming language for automating repetitive geoprocessing.  It has an easy-to-use flowchart-like interface that allows users to drag, drop, link, and loop ArcToolbox tools and input data files for quick, repetitive execution.  Models are easy to troubleshoot and change, and at the end, users can export a rudimentary python script for further development.  This hands-on workshop assumes some prior familiarity with ArcGIS and Toolbox tools. (Registration4/7)

Intro to Spatial AnalysisThis course will focus on developing spatial questions, visualizing spatial data, and creating statistically sound analysis plans for spatially relevant data. Specifically, the course will include techniques within the ArcGIS environment to visualize and explore spatial data, assess geographic clustering, and perform basic spatial prediction. The course is designed to provide a foundation in spatial statistics and highlight key caveats and questions to consider when working with spatial data.  Prospective attendees should have working/intermediate knowledge of ArcGIS. All tools and data will be available on site using tools availible to all Yale faculty and students.  (Registration4/7)

R Workshops

Intro to R | R is a free, open source development language for statistical computing and graphics. Because of its price and large development community, R is quickly becoming the statistical application of choice at Yale. R has add-ons for GIS, graphing, advanced statistics, econometrics, image analysis and more. This class offers an extremely basic introduction to the programming language and resources available. Basic statistical understanding is expected. (Registration: 2/10 | Registration: 3/31)

Data Visulaization with R: ggplot (Registration: 4/21)

SPSS Workshops

Intro to SPSS​ | SPSS is a flexible and user-friendly statistical software package known for its graphics, quick assessment tools and easy programming language. SPSS also works directly with Excel files. Widely used in all of the social sciences, SPSS offers add-ons which enable qualitative analysis, missing values analysis, and Survey design. This class offers a very basic introduction to the application GUI and coding mechanisms. Basic statistical understanding is expected but not necessary. (Registration2/17)

Intermediate SPSS | This workshop will build upon the Introduction to SPSS workshop. The focus will be on testing for moderation using Analysis of Variance and multiple linear regression. Sample topics include decomposing interactions with post-hoc tests and planned contrasts, and simple slopes analysis. Basic statistical understanding and beginner knowledge of SPSS is expected. (Registration3/10)

STATA Workshops

Intro to STATA | Stata is a popular integrated statistical program used by academic researchers across campus, especially in economics, political science, and EPH. If you are a total to moderate rookie with Stata (i.e. have never used or only ever used “regress” for class) and want to learn more about importing, merging, and cleaning your data, this class is for you. We will cover the basics: getting around the program, do files, graphics and table generation. (Registration1/27)

Programming & Python Workshops

Intro to Python | Python is an easy to learn, powerful programming language. It has efficient high-level data structures and a simple but effective approach to object-oriented programming. In this workshop session, we’ll introduce you to basic Python programming with some examples of simple data analysis and GIS. No programming experience or statistical training required. (Registration2/3)

Intro to the Command Line: UNIX/Linux | A lot of software programs do not come with a graphical user interface (GUI), and a Unix command-line terminal environment is required to run such programs. In this 2-hour session, you will learn the basics of a Unix command-line terminal, such as how to navigate the file system, the permission and security structure, and how to run programs from the command line. No previous Unix or command-line experience is required to attend this session. (Registration2/24)

Web Scraping with Python  Websites can be full of useful data that are not always downloadable or easily accessible. Rather than doing a manual copy/paste of a site, python allows you to access the raw HTML behind every webpage and automate the process of retrieving, structuring, and outputting data from pages across a domain. This workshop will cover identifying good candidates for scraping, discovering what data can be scrapped, and how python helps automate the process. Attendees are encouraged to bring in examples of sites they want to scrape as there may be some time to discuss individual projects.This class assumes a working knowledge of python (running code, installing libraries, etc) and familiarity with HTML structure. (Registration2/24)

Data Visualization Workshops

Data Visualization with Tableau | This workshop will familiarize you with key issues in data visualization. In addition to covering the fundamental principles behind effective visualizations, we will also touch on common pitfalls that result in confusing or misleading graphics. During the workshop, participants will gain hands-on experience using Tableau — interactive, data visualization software — to produce dynamic, compelling visualizations for all kinds of data. (Registration2/10)

Data Visualization with R: ggplot | (Registration4/21)

Data Science & Analysis

Data Analysis for Beginners | Details forthcoming (Registration2/3)

Managing & Cleaning Research Data | Data doesn’t have to be messy. Part I of this workshop will introduce researchers (from postdocs to undergrads) to the fundamentals of research data management. You’ll learn about the data life cycle: creating, processing, analyzing, preserving, giving access to, and re-using data. We’ll discuss how to identify the current best practices in your field and any funder or publisher mandates that you’ll need to be aware of. Topics will include metadata standards, data documentation, data preservation, and how to access Yale’s many resources for data management help. In addition, we’ll discuss data management guidelines for NIH, NSF, and NEH grants. Part II will introduce the open source data cleaning tool OpenRefine and demostrate how this powerful, free software can help normalize, clean, and structure your messy research data. We will use sample messy data for the workshop, but attendees are welcome to bring their own messy data for tips on cleaning. (Registration3/3)

Intro to Machine Learning | This workshop will serve as a theoretical and practical introduction to using machine learning methods to solve problems in clinical research. It takes a high-level approach, with almost no equations, and uses examples drawn from various areas of biology. The workshops will include an introduction to the field, and then a broad tour through the machine learning “pipeline”:  feature selection, algorithm selection, measuring performance, and model validation. Some example code will be provided.  (Registration3/31)

See past workshop offerings:…

Download workshop materials (handouts, sample data, etc):…