Python

The material in this section was largely borrowed from a paper we presented in the NeuroIS conference in 2019. The reference to the paper is given below:

Conrad, C., Agarwal, O., Calix Woc, C. Chiles, T., Godfrey, D., Krueger, K., Marini, V., Sproul, Al. and Newman, A. (2019). On Using Python to Run, Analyze and Decode EEG Experiments. 2019 NeuroIS Retreat, spring 2019.

Introduction

Python is a free and open source general purpose programming language that is designed to be readable by people who are not software developers. At NCIL, we use Python because there are so many different things that you can do with it, ranging from creating graphics to analyzing EEG data to conducting machine learning. Like most programming languages, there is a high learning curve to master Python programming. Fortunately, you will not be expected to master it, though you will be expected to adapt pre-existing code to suit your needs. It is very helpful to learn how to program, and it is strongly recommended if you plan to be around for more than a few years (i.e. grad students).

Learning the basics of Python programming

For most people, the best way to learn how to program is to have a problem that you would like to solve. Programming is sort of like riding a bike; sure there are professional cyclists, but there is also the process of learning how to ride a bike in the first place. For most lab members, your "bike" will be your honors or graduate project. You will need to conduct an experiment, collect data, analyze data and disseminate the results. Python can be used to take most of the heavy lifting from these tasks thanks to Python's extensive libraries. However, you will need to be able to create the programs necessary to conduct your research project by either adapting someone else's code or by creating the code yourself from scratch.

There are at least dozens of good tools for learning Python for free. Codecademy, for instance, is widely regarded to be a great learning source and has good free content. W3Schools also has a really great introduction to the basics of many common programming and web languages. Alternatively, you can find one of the Python O'Reilly books laying around the lab to learn more.

Python tools for EEG

Figure 1 illustrates the various Python tools used in the lab and Table 1 lists the Python packages that comprise the stack we use in our experimental protocols. The base of this is Anaconda (see Anaconda, 2019), a collection of Python packages designed for scientific computing, which come bundled with a “package manager”: a tool for installing and updating packages that ensures that all are compatible and inter-operable with each other. The value of Anaconda is that by downloading and installing this single package, the user is readily equipped with a wide variety of Python packages that will work together, without the overhead of identifying the necessary set of packages for a task and resolving compatibility issues.

Figure 1. Illustration of the Python Stack and EEG data analysis process.

The second tool highlighted in Table 1 is Jupyter (see Jupyter, 2019). This is a scientific “notebook” application which allows the user to write and execute code, view and save the results, and write rich-text documentation using Markdown formatting, all in a single file that is accessed via a Web browser. This has significant advantages over other approaches to using Python or other programming languages, such as interacting with a command line or using an integrated development environment; because all elements of the process are encapsulated in a single file, it is very easy to share and reproduce analysis pipelines across experiments and between labs.

Another advantage of Jupyter notebooks is that, once a pipeline has been implemented in a notebook, e.g., for the processing of EEG data from an individual participant, notebooks can simply be copied and re-run for each additional participant, and/or easily adapted to new experiments. This means that while proficiency in the Python language is necessary to build the pipelines in the first place, users with little to no programming expertise can readily adapt and run these notebooks for new participants or groups of participants. This makes these an excellent entry point for new researchers who wish to engage in NeuroIS research without first learning Python programming—not only because the learning curve is less steep, but because the notebook format makes it easy for senior lab members to audit others’ work to ensure quality control. In this regard it is notable that our lab moved to this pipeline several years ago from the Matlab-based EEGlab platform, which is also widely used in cognitive neuroscience research (Delorme and Makeig, 2004). While EEGlab offers a menu-driven graphical user interface (GUI), users have to choose the appropriate menu items and manually enter the appropriate parameters each time, and these settings are not all recorded in the output—making the process both more error-prone and more difficult to audit.

Table 1. Description of the recommended stack of Python tools for EEG analysis

Tool Name

Developers

Description

Anaconda

Anaconda Inc.

A distribution of the Python programming language for scientific computing.

Jupyter

Pérez et al. (2015)

A notebook format for sharing code and computational narratives.

Matplotlib

Hunter et al. (2007)

A 2D graphics package for the creation of publication-quality images.

NumPy

van der Walt et al. (2011)

A library for scientific computing and analysis.

Pandas

McWinney et al. (2011)

A data library optimized for manipulating large and time series data.

PsychoPy

Peirce et al. (2007)

An application and library used to run psychology and neuroscience experiments.

MNE-Python

Gramfort et al. (2014)

A library for preparing, analyzing and visualizing MEG, EEG and other related data.

scikit-learn

Pedregosa et al. (2011)

A machine learning library.

The other packages listed in Table 1 include a set of very widely-used tools for scientific computing (Matplotlib, NumPy, and Pandas), PsychoPy for experimental programming and data collection, MNE-Python (hereafter referred to as MNE) for EEG data preprocessing and analysis, and scikit-learn for machine learning. In what follows we describe the steps involved in running and analyzing the results of an EEG experiment using these packages. Note that the Matplotlib, NumPy, and Pandas packages are not explicitly described but are used by the tools that are described.

How to run experiments and process data

Experimental Protocol and Data Collection

An EEG experiment begins with presentation of stimuli to a participant, time-locked with collection of behavioral, EEG, and possibly other physiological measures. Software capable of precise time-locking is essential here, because measures such as EEG have temporal precision on the order of milliseconds. Some mode of inter-device communication is also required, because physiological data such as EEG is typically recorded on a separate device from that controlling stimulus presentation. The PsychoPy library (Peirce, 2007) provides an environment for the presentation of a wide range of stimuli such as images, sounds, and movies, as well as collection of behavioral and vocal responses, and the ability to send precisely time-locked “trigger codes” to other hardware such as EEG data collection systems. These trigger codes are essential for later data analysis as they store, in the EEG data file, both the precise timing of events of experimental interest (e.g., stimulus onset, response times), and the identity of these events (e.g., stimulus type, correct vs. incorrect response). PsychoPy offers the ability to build experiments using either a GUI, which translates the user’s design into Python code, or writing Python code directly. This again allows users with varying levels of expertise to participate fruitfully in the research enterprise. In addition to output sent to other devices, PsychoPy will save the precise timing of all events in the experiment to a text file for later analysis in any package the user desires.

Preprocessing EEG data in Python

Following data collection, EEG data must be preprocessed and analyzed. Preprocessing involves a number of steps designed to improve the signal-to-noise ratio of the data and increase the ability to detect experimental effects, if they are present. In our pipeline, EEG preprocessing and analysis are performed using MNE. MNE provides a collection of data reading and conversion utilities which can be used to import and prepare data from a variety of hardware systems, including most common EEG and MEG systems. MNE converts data into a mne.raw object which includes the raw timecourse data, time-locked trigger codes, and metadata such as participant ID, date and time of data collection, the labels for each data channel, etc.

Common preprocessing steps for EEG data (Luck, 2014; Newman, 2019) include: band-pass filtering; removal of data channels (electrodes) and trials contaminated with excessive noise; correction of other well-defined artifacts such as eye blinks, eye movements, and muscle noise; and re-referencing EEG data to an electrode(s) appropriate to the experiment. MNE provides functions dedicated to each of these tasks, which have been designed to implement best practices in EEG/MEG research (e.g., the choice of filter type). This relieves the user of the need to extensively research all of their preprocessing parameter choices yet allows—through the use of command-line options—control over common parameter choices (e.g., filter bandwidth). As data are processed, the data are converted from raw format (continuous EEG data) to MNE’s epochs format (segments of data time-locked to experimental events of interest) and finally to MNE’s evoked format (averages across all epochs of a given category).

While MNE is under active development, at this time it has a wide variety of tools implementing common functions in EEG preprocessing, such as independent components analysis (ICA) for artifact removal. The supplementary material for this paper includes Jupyter notebooks demonstrating our EEG preprocessing pipeline, including the specific MNE functions and associated parameters used, and documentation elaborating on usage and choice of parameters. Sample Jupyter notebooks for processing can be found at https://github.com/cdconrad/py-bci or on the Architect server.

Analysis and classification

Finally, after preprocessing the data, users can visualize data at the individual or group level, perform analyses to determine if hypothesized effects are present, and/or attempt classification of data based on machine learning. MNE provides tools for the visualization of EEG/MEG data in the time and frequency domains, as both waveform plots at individual or clusters of channels, and as scalp topographic maps. It also includes algorithms for source localization, allowing visualization of data on the cortical surface. MNE also provides some tools for statistical analysis—including parametric (t-tests, linear regression) and non-parametric (t-test, clustering) approaches and methods for multiple comparison correction—and machine learning decoders. However, perhaps one of the most powerful features of the Python stack is the compatibility between the data formats and machine learning libraries; because MNE is built on the NumPy and Pandas packages, it is easy to convert MNE data to these packages’ data objects. This allows the use of a wide variety of other packages in Python, such as scikit-learn, a widely-used package implementing a wide variety of machine learning tools. As well, MNE data objects can be readily exported for use in other statistical packages such as R. We will discuss R and what R can be used for in the next section.

References

Anaconda, Inc. (2019). Anaconda Distribution: The World’s Most Popular Python/R Data Science Platform. Retrieved from https://www.anaconda.com/distribution/

Conrad, C., Agarwal, O., Calix Woc, C. Chiles, T., Godfrey, D., Krueger, K., Marini, V., Sproul, Al. and Newman, A. (forthcoming). On Using Python to Run, Analyze and Decode EEG Experiments. 2019 NeuroIS Retreat, spring 2019.

Delorme, A. & Makeig, S. (2004) EEGLAB: an open source toolbox for analysis of single-trial EEG dynamics including independent component analysis. Journal of neuroscience methods 134(1), 9–21.

Gramfort, A., Luessi, M., Larson, E., Engemann, D. A., Strohmeier, D., Brodbeck, C., Parkkonen, L. and Hämäläinen, M. S. (2014). MNE software for processing MEG and EEG data. Neuroimage 86, 446-460.

Hunter, J. D. (2007). Matplotlib: A 2D graphics environment. Computing in science & engineering 9(3), 90-95.

Luck, S. (2014). An introduction to the event-related potential technique, Second Edition. MIT Press.

McKinney, W. (2011). Pandas: a foundational Python library for data analysis and statistics. Python for high performance and scientific computing 14.

Newman, A. (2019). Research methods for cognitive neuroscience. SAGE Publications.

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E. (2011). Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12, 2825-2830.

Peirce, J. W. (2007). Psychopy—psychophysics software in Python. Journal of neuroscience methods 162(1), 8-13.

Perez, F. and Granger, B. (2015) Project Jupyter: Computational Narratives as the Engine of Collaborative Data Science. Retrieved from: http://archive.ipython.org

Van Der Walt, S., Colbert, S. C., & Varoquaux, G. (2011). The NumPy array: a structure for efficient numerical computation. Computing in Science & Engineering 13(2).

Last updated