eLabFTW crawler#
Crawler is python code which accesses the eLabFTW via the API. Its possible to perform all functions as in the browser via API.
introduction to API https://doc.elabftw.net/api.html
description of API endpoints https://doc.elabftw.net/api/v2/
Where is the crawler#
crawler code At the moment its planned that the crawler is running at a virtual BW-Cloud instance. to access the instance using shh
ssh -i test_schluesser.pem ubuntu@10.20.xx.xx
test_schluesser.pem and ip are to be acquired from save sources..
Requirements for the instance:#
ubuntu 24.04
inside university firewall
https and mysql ports open
How to setup:#
install miniconda
mkdir -p ~/miniconda3
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh
bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
rm -rf ~/miniconda3/miniconda.sh
~/miniconda3/bin/conda init bash
create conda env
conda create -n crawler python=3.9
install de_DE.UTF8 locale as BW-Cloud does not have it by default
sudo apt-get install language-pack-de
sudo dpkg-reconfigure locales
HighLevel explanation of how crawler works#
I will try to describe the implemented logic of the crawler.
Access to DataJoint#
Is done in the same way as for individual user. See here. However, as no graphical interface is available the server_settings.json and user_settings.json need to be modified via cmd editor such as nano.
nano ./datastructure_tools/DataJoint/server_settings.json
nano ./datastructure_tools/user_settings.json
ctrl+o to save
ctrl+x to exit
or one could copy local files via
scp -i crawler_keypair.pem /path2/user_config.json ubuntu@10.20.xx.xx:~/DataStructure_tools/datastructure_tools/
scp -i crawler_keypair.pem /path2/datastructure_tools/DataJoint/server_config.json ubuntu@10.20.xx.xx:~/DataStructure_tools/datastructure_tools/DataJoint/server_config.json
Datastructure tools are installed and a conda environment. TODO, maybe move to docker container ?
Scheduled task#
The crawler is running a scheduler to run different jobs at different timepoints.
every minute - new entries from surgeries, animalsheets, perfusions
every 2 minutes, animals, viruses and updates to all templates with newest state of animals.
every 5 minutes the crawler signs a “healthentry” of Maintance category resource, this is used to visualize the status
every day the crawler creates AnimalSheets and uploads those to the corresponding animal
Crawling for animals#
read current animals in DB
read entries in eLabFTW (of resource category animals)
load the meta fields
check:
delivery date is entered
animal_nr is entered and is int
age at delivery is a float
user entered or if not take the creator as user
checks that user as orgid (RZ-kuerzel)
animal note is take from main text body
if no errors so far, tries to push to DB, DB.Animal
if errors, comments on the entry
if push successful, changes title to animal_id
adds tva as a tag to the entry
comments success
logs entry
Crawling for viruses#
reads viruses in DB
read entries in eLabFTW (of resource category virus)
load the meta fields
check:
delivery date is entered
if no errors so far, tries to push to DB DB.Virus
if errors, comments on the entry
if push successful, changes title to virus#batch_nr
adds tva as a tag to the entry
comments success
logs entry
Crawling for behavioral entries#
crawl4Sessions
reads entries in eLabFTW (of resource category Behavior and locked!)
checks if entry was already crawled (DB entry)
loads the meta fields
creates a session_id
DB.Session
Crawling for perfusions#
crawl4Perfusions
reads entries in eLabFTW (of resource category Perfusion and locked!)
checks if entry was already crawled (DB entry)
loads the meta fields
checks for animal_id
pushes to DB DB.Animal.Death
Creates animalsheet entry with some pre-filled fields:
procedure: f”Euthanasia,{perfusion_d[‘drugs’]} Perfusion”
weiterleben: ‘Toetung’
Crawling for water deprivations#
crawl4Waterdeprivation
read entries in eLabFTW (of resource category WaterDeprivation and locked!)
checks if entry was already crawled (DB entry)
loads the meta fields
pushes to DB: DB.ZWR
Creates animalsheet entry with some pre-filled fields:
procedure: f’Kontrolle und Beginn der Wasserdeprivation, ZWR:{ZWR}’
Crawling for animal sheets#
crawl4AnimalSheetEntries
read entries in eLabFTW (of resource category AnimalDocumentation and locked!)
checks if entry was already crawled (DB entry)
loads the meta fields
pushes to DB DB.AnimalSheetEntry
Crawling for surgeries#
crawl4Surgeries
read entries in eLabFTW (of resource category Surgery and locked!)
checks if entry was already crawled (DB entry)
load the meta fields
check:
surgery date is entered
animal_nr is entered and is int
age at delivery is a float
user entered
checks if entry is already inDB via animal_id and surgery datetime
pushed to DB DB.Surgery
logs idx as crawled
puts animal_id as tag
link animal to entry
puts tva as tag if avaible
creates an animalsheet entry in DB and eLabFTW (links to this entry)
Pushes#
At initialization the crawler can push current state of the DB to eLabFTW. The templates for individual entries are designed in a way that the fields have same names as in DB, this allows easy exchange via dictionaries.
Categories#
are created from classes ResourceCategories and ExperimentCategories, get_expCategory_id returns the category id or creates if not available yet. Individual categories are subclass of the corresponding class. This is done so to ensure consistent naming and to use those names as ENUMS.
Templates#
Templates can be designed in the browser or programatically Here we use classes like WaterDeprivationFields. In there subclasses represent individual meta fields to be entered. properties such as field type are set from FieldTypes. Important here: the class name needs to corrsepond to the field name in DB if it is to be filled or transfered to DB. the name displayed to user is defined in property name. The order of fields can be defined via property position (1 based). property required, tries to indicate to user that a fields is required but does not enforce it.
Pushing animals from DB to eLabFTW#
template AnimalFields from eLabFTW_api_utils pushAnimalsDB2ELN - function pushes the animal entries which are in the DB to eLabFTW:
reads animals from DB
reads animals from eLabFTW (Resources of category Animals) consideres the title of each entry as animal_id
checks whether the animal is already in eLabFTW
fills the template fields from DB
Pushing viruses from DB to eLabFTW#
template VirusFields from eLabFTW_api_utils pushVirusDB2ELN - function pushes the viruses entries which are in the DB to eLabFTW:
reads viruses from DB
reads viruses from eLabFTW (Resources of category Virus) reads virus and batch via title f”{virus[‘virus’]}#{virus[‘batch_nr’]}”
checks whether the virus is already in eLabFTW
fills the template fields from DB
written by: Artur
last modified: 2024-02-22