Tutorial for the datastructure_tools package

Tutorial for the datastructure_tools package#

This toolbox should ease accessing data via python (e.g. to automatize entering data for experiments etc.). Before you run through this tutorial, please follow the instructions on the github on how to install this package.

if you want to use the virtual environment in jupyter notebooks please follow this tutorial#

https://medium.com/@nrk25693/how-to-add-your-conda-environment-to-your-jupyter-notebook-in-just-4-steps-abeab8b8d084

Initial setup:#

The first time you install the datastructure_tools you will have to run the admin commander and set a few things.

You will need to set the serveradress to the database server. The server address is preset. We have two different databases on the server: opto_db (for real data) and opto_db_test where you can play around without the possibility of destroying any real data. In the beginning, to familiarize yourself with everything, we advice to use the opto_db_test.
If the serverpath was not found on initial startup, you have to change the serverpath manually. This can be done under the tab “user config”. Using the Browse button, just select the folder that leads to the project folder on the server.
Now you can check if your project is already part of the database. We distinguish between two things: The file server which runs as usual and the database which stores metadata about files on the fileserver. If the project does not exist on the dataserver, you will have to talk to an IT admin of your choice to make a new project folder. Please keep the convention on how to name a project folder in mind (see below). As soon as the project folder is created, it exists on the fileserver, but not yet in the database. For this we switch to the tab Project and add the project to the database. Choose the project you want to add, then add some description that explains what the project is about in a few words. Also add some keywords that we can search for later, like the main brain structures, the species, etc. We can add keywords also later, outside of the admin commander.
create the experiment folder inside your project manually. You should have all rights to change everything within your specific project file (else, ask an IT admin of your preference). Now simply create an experiment folder inside of that. Please use snake case, because it will lead to the least amount of problems later during programming scripts. An experiment name could e.g. be reversal_learning_ALM_inhibition.
Next, we will check if all the building blocks for the experiments are there. Building blocks represent common things that are reused between different experiments. One example would be behavior or ephys. We can add the building block and later reuse them between different experiments. The building blocks itself tell the scripts which folders are needed on the file server. By default the folders will be created in 0_raw and 1_preprocessed with the same name. If you want to include the folder in 2_processed aswell, just click the checkmark to modify and then insert the name in raw, preprocessed and processed. Good practice is to keep the name the same, unless there is a good reason to change them between 0_raw, 1_preprocessed and 2_processed. The building blocks should be as general as possible such that we can reuse the same structure over different experiments. To do all this, simply switch over to the tab “Building blocks”. Check on the right side if any of the building blocks are missing that are needed for your specific experiment and if so, add them on the left side by entering a name, a description and also how the folders are named in raw, preprocessed and processed. Remember that by default only raw and preprocessed will be added using the same name. If you need a processed folder, check the modify checkbar and then insert the needed name into all three columns. If the building blocks have more than one word, snake case would be recommended, e.g. behavior_video.
Combine building blocks for your specific experimenttemplate. A convention here is that everyone makes a template for his specific experiment unless there already is something they can reuse. A template could be for example spatial_reversal_learning_miniscope. This template would then include behavior (e.g. a file containing the trial by trial result), miniscope (this folder contains the neural activity in the form of miniscope videos) and DAQ (this folder contains the DAQ recordings to bind them together). Another experiment template could e.g. combine that with videorecordings and then also include the behavior_video building block. If the experiment template contains more than one word, snake case would be recommended, e.g. reversal_learning
To make your life easier, next create a user config under the tab “user config”. Choose your name (if it is not in the list, create your username under the tab “user”), choose your prefered project (the one you are usually working on), the preferred experiment, aswell as the experiment_template you usually use. On the right side, you can choose the folders that your experimental data usually sits in to make your life easier later, if you use the file commander to copy files. All of these will be saved in a specific file called user_config.json locally on your computer. So if you setup the datastructure on multiple computers, follow the server config and user config steps for each setup Press create user config and voila, you are done with your initial setup! Congratulations!

#todo: how to disable testmode

open the GUIs via python#

To open the guis (commanders) via python you have two possibilities:

import the specific commander and start the gui via the start_gui() function
within your conda environment run python AdminCommander.py in the datastructure_tools path

Below you can find example usages for commanders using the first possibility

from datastructure_tools import AdminCommander
#to run the admin commander use this command
AdminCommander.start_gui()

[2024-02-16 15:12:40,261][INFO]: Connecting optouser@10.4.26.150:3306
[2024-02-16 15:12:41,149][INFO]: Connected optouser@10.4.26.150:3306

from datastructure_tools import AnimalCommander
# to add animals, run the following GUI
AnimalCommander.start_gui()

from datastructure_tools import SessionCommander
# lets add a session! Just run the following GUI for that:
dt.SessionCommander.start_gui()

from datastructure_tools import FileCommander
# to open the file commander, run the following GUI
dt.FileCommander.start_gui()
#inside of this GUI add all the files you want to copy. If you are done, proceed to press add and then run. 
# The filecommander will the copy all the files and give you an output on the lower left side when it was successful. 

from datastructure_tools import EquipmentCommander
# you can add electrodes using the equipment commander
dt.EquipmentCommander.start_gui()

from datastructure_tools import VirusCommander
#to add viruses, use the virus commander
dt.VirusCommander.start_gui()

from datastructure_tools import SurgeryCommander
# in case you want to insert a surgery, simply run the surgery commander
dt.SurgeryCommander.start_gui()

from datastructure_tools import WeightCommander
# For adding weights, we included the weight commander:
dt.WeightCommander.start_gui()

Writing your own code accessing the database#

Here comes the fun part! It makes sens to familiarize yourself with the datajoint tutorial, from here: https://datajoint.com/docs/core/datajoint-python/0.14/ If you plan to code using the database, we provide you with an easy access point. The DatabaseAccess() class. The following section contains a few commands to get you started to program with the new database.

# we start with importing the DataBaseAccess module from which we can instanciate the class below
from datastructure_tools import DataBaseAccess

# Now we instanciate the class
DB =DataBaseAccess.DataBaseAccess()
# With this command, we now have loaded the DataBaseAccess class. If everything goes correct,
# the output should be an info that you are now connected as user @ dbip

# using the DB class, you can also check e.g. the server that you are connected to
print('the server is {}'.format(DB.cfg['SQLserver_cfg']['host']))
# and the database name
print('the database name is {}'.format(DB.database_name))

[2024-02-16 15:55:10,358][INFO]: Connecting optouser@10.4.26.150:3306
[2024-02-16 15:55:10,956][INFO]: Connected optouser@10.4.26.150:3306

the server is 10.4.26.150
the database name is opto_db

If the following output is seen: No valid root directory found (from [‘/mnt/diester/archive/projects’, ‘O:\archive\projects’, ‘/mnt/server/archive/projects’, ‘/media/server/archive/projects’]) Connection to server is not possible!!! That means that

you are not connected to the file server
that you are not connected to the DB server

The first point is usually just a problem of where the file server is mounted (in Windows it should be O:\ to work. The second problem could arise if you are not in the university network or if the DB server is down. Check that you are in the university network. If you are and the problem still arises, check the Admin Commander and reconnect to the DB server. If the problem still persists, talk to us!

Accessing Info from the database#

To access info from the database, its enough to call the specific table.

#now we can access info from the database.
DB.Session

All sessions with corresponding infos.

session_id yyyymmdd_m0042_wt_hhmm	animal_id unique animal id	session_datetime datetime of the session	session_dir Path to the data directory for a session
20230126_m1018_wt_1752	m1018_wt	2023-01-26 17:52:38	2023_intercontext/PICAST/data/0_raw/20230126_m1018_wt_1752
20230126_m1019_wt_1853	m1019_wt	2023-01-26 18:53:06	2023_intercontext/PICAST/data/0_raw/20230126_m1019_wt_1853
20230126_m1020_wt_1942	m1020_wt	2023-01-26 19:42:42	2023_intercontext/PICAST/data/0_raw/20230126_m1020_wt_1942
20230126_m1020_wt_2038	m1020_wt	2023-01-26 20:38:41	2023_intercontext/PICAST/data/0_raw/20230126_m1020_wt_2038
20230127_m1018_wt_1053	m1018_wt	2023-01-27 10:53:39	2023_intercontext/PICAST/data/0_raw/20230127_m1018_wt_1053
20230127_m1018_wt_1349	m1018_wt	2023-01-27 13:49:56	2023_intercontext/PICAST/data/0_raw/20230127_m1018_wt_1349
20230127_m1019_wt_1133	m1019_wt	2023-01-27 11:33:07	2023_intercontext/PICAST/data/0_raw/20230127_m1019_wt_1133
20230127_m1019_wt_1500	m1019_wt	2023-01-27 15:00:54	2023_intercontext/PICAST/data/0_raw/20230127_m1019_wt_1500
20230127_m1020_wt_1240	m1020_wt	2023-01-27 12:40:33	2023_intercontext/PICAST/data/0_raw/20230127_m1020_wt_1240
20230127_m1020_wt_1629	m1020_wt	2023-01-27 16:29:31	2023_intercontext/PICAST/data/0_raw/20230127_m1020_wt_1629
20230130_m1018_wt_1641	m1018_wt	2023-01-30 16:41:02	2023_intercontext/PICAST/data/0_raw/20230130_m1018_wt_1641
20230130_m1019_wt_1736	m1019_wt	2023-01-30 17:36:57	2023_intercontext/PICAST/data/0_raw/20230130_m1019_wt_1736

...

Total: 1650

#you can see now a table of every session that has been added, with all the information that is attached to it. 
# to fetch the information in a way to work with them, we can fetch them as a dictionary using the fetch command.

session_dict_list = DB.Session.fetch(as_dict = True)

#The information is fetched as a list of dictionaries. E.g. the first session in our database can be accessed 
# by indexing this
session_dict_list[0]

{'session_id': '20230204_r0001_wt_1222',
 'animal_id': 'r0001_wt',
 'session_datetime': datetime.datetime(2023, 2, 4, 12, 22, 58),
 'session_dir': 'NewProject/Final Experiment/data/0_raw/20230204_r0001_wt_1222',
 'session_note': 'Test Session'}

# to get the session id, we can run the following:
session_dict_list[0]['session_id']

'20230204_r0001_wt_1222'

# the database access class provides all kinds of useful functions. For example it gives you a function to get the 
# session path of a certain session via the get_session_directory function
print(DB.get_session_directory(session_dict_list[0]))

/mnt/server/archive/projects/datastructure_test/NewProject/Final Experiment/data/0_raw/20230204_r0001_wt_1222

#as you can see, this prints the full folder based on the server path that works on your specific computer.
# this provides an easy way to access data within a session. It will be outputted as a Path object, such that 
# you can easily merge subfolders to it regardless of your operating system (for more info see Path from pathlib)
DB.get_session_directory(session_dict_list[0]) / '0_raw' / 'ephys'

PosixPath('/mnt/server/archive/projects/datastructure_test/NewProject/Final Experiment/data/0_raw/20230204_r0001_wt_1222/0_raw/ephys')

# you can also query our results and e.g. get all the sessions done with a particular rat. 
# We greatly recommend looking at the datajoint tutorial, but here is an example to get you hyped:
(DB.Session & (DB.Animal & 'animal_id = "r0001_wt"')).fetch(as_dict = True)

[{'session_id': '20230204_r0001_wt_1222',
  'animal_id': 'r0001_wt',
  'session_datetime': datetime.datetime(2023, 2, 4, 12, 22, 58),
  'session_dir': 'NewProject/Final Experiment/data/0_raw/20230204_r0001_wt_1222',
  'session_note': 'Test Session'}]

#or all the sessions associated with a certain user
DB.Session & (DB.User & 'user = "fs539"').fetch(as_dict = True)

All sessions with corresponding infos.

session_id yyyymmdd_m0042_wt_hhmm	animal_id unique animal id	session_datetime datetime of the session	session_dir Path to the data directory for a session	session_note some notes about the session
20230204_r0001_wt_1222	r0001_wt	2023-02-04 12:22:58	NewProject/Final Experiment/data/0_raw/20230204_r0001_wt_1222	Test Session

Total: 1

# There is also a class that simplifies adding sessions from your favorite experiment. 
# It is part of the utils file
from datastructure_tools import utils

#within the utils file we have the session class
# in the following documentation, you can see how to use it
utils.SessionClass?

Init signature: utils.SessionClass(DB, test=False, *args, **kwargs)
Docstring:     
class abstracting all infos about the session
can be reused in functions

Required inputs:
DB -  handle to DataBaseAccess
animal_id -> str animal id in DB, will be checked
session_datetime
session_note -> str can be not provided
project -> str project in DB, will be checked
user -> str user in DB, will be checked
expName -> str experiment name folder
experiment_template -> str experiment_type name will be checked

how to use:
session = SessionClass(self.DB, animal_id=AnimalName2use, session_datetime=session_datetime,
                           session_note=self.Session_note.toPlainText(), project=self.Project_combo.currentText(),
                           user=self.User_combo.currentText(), expName=self.ExpName_combo.currentText(),
                           experiment_template = self.ExpType_combo.currentText(),test = BOOL if a test session)

pathcreationSuccess = session.createSession_path() # create Paths on server
# TRUE if worked
PushSuccess = session.checkInputs() # checks inputs and pushes to DB
% TRUE if worked
session.weight = float(self.AnimalWeightEdit.text())
session.weight_note = self.WeightNoteEdit.text()
session.pushWeights()
File:           d:\code\datastructure_tools\datastructure_tools\utils.py
Type:           type
Subclasses:     

This tutorial is meant to ease your entry into the datastructure tools.#

It is meant to become more comprehensive from time to time. Please give me some feedback about what you are missing / what else you would like to see in the tutorial! For now: Happy coding