eLabFTW crawler#

Crawler is python code which accesses the eLabFTW via the API. Its possible to perform all functions as in the browser via API.

Where is the crawler#

crawler code At the moment its planned that the crawler is running at a virtual BW-Cloud instance. to access the instance using shh

ssh -i test_schluesser.pem ubuntu@10.20.xx.xx

test_schluesser.pem and ip are to be acquired from save sources..

Requirements for the instance:#

  • ubuntu 24.04

  • inside university firewall

  • https and mysql ports open

How to setup:#

  • install miniconda

mkdir -p ~/miniconda3
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh
bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
rm -rf ~/miniconda3/miniconda.sh
 ~/miniconda3/bin/conda init bash
  • create conda env

conda create -n crawler python=3.9

install de_DE.UTF8 locale as BW-Cloud does not have it by default

sudo apt-get install language-pack-de
sudo dpkg-reconfigure locales

HighLevel explanation of how crawler works#

I will try to describe the implemented logic of the crawler.

Access to DataJoint#

Is done in the same way as for individual user. See here. However, as no graphical interface is available the server_settings.json and user_settings.json need to be modified via cmd editor such as nano.

nano ./datastructure_tools/DataJoint/server_settings.json
nano ./datastructure_tools/user_settings.json

ctrl+o to save
ctrl+x to exit

or one could copy local files via

scp -i crawler_keypair.pem  /path2/user_config.json ubuntu@10.20.xx.xx:~/DataStructure_tools/datastructure_tools/
scp -i crawler_keypair.pem /path2/datastructure_tools/DataJoint/server_config.json ubuntu@10.20.xx.xx:~/DataStructure_tools/datastructure_tools/DataJoint/server_config.json 

Datastructure tools are installed and a conda environment. TODO, maybe move to docker container ?

Scheduled task#

The crawler is running a scheduler to run different jobs at different timepoints.

  • every minute - new entries from surgeries, animalsheets, perfusions

  • every 2 minutes, animals, viruses and updates to all templates with newest state of animals.

  • every 5 minutes the crawler signs a “healthentry” of Maintance category resource, this is used to visualize the status

  • every day the crawler creates AnimalSheets and uploads those to the corresponding animal

Crawling for animals#

  • read current animals in DB

  • read entries in eLabFTW (of resource category animals)

  • load the meta fields

  • check:

    • delivery date is entered

    • animal_nr is entered and is int

    • age at delivery is a float

    • user entered or if not take the creator as user

      • checks that user as orgid (RZ-kuerzel)

    • animal note is take from main text body

  • if no errors so far, tries to push to DB, DB.Animal

  • if errors, comments on the entry

  • if push successful, changes title to animal_id

  • adds tva as a tag to the entry

  • comments success

  • logs entry

Crawling for viruses#

  • reads viruses in DB

  • read entries in eLabFTW (of resource category virus)

  • load the meta fields

  • check:

    • delivery date is entered

  • if no errors so far, tries to push to DB DB.Virus

  • if errors, comments on the entry

  • if push successful, changes title to virus#batch_nr

  • adds tva as a tag to the entry

  • comments success

  • logs entry

Crawling for behavioral entries#

crawl4Sessions

  • reads entries in eLabFTW (of resource category Behavior and locked!)

  • checks if entry was already crawled (DB entry)

  • loads the meta fields

  • creates a session_id

  • DB.Session

Crawling for perfusions#

crawl4Perfusions

  • reads entries in eLabFTW (of resource category Perfusion and locked!)

  • checks if entry was already crawled (DB entry)

  • loads the meta fields

  • checks for animal_id

  • pushes to DB DB.Animal.Death

  • Creates animalsheet entry with some pre-filled fields:

    • procedure: f”Euthanasia,{perfusion_d[‘drugs’]} Perfusion”

    • weiterleben: ‘Toetung’

Crawling for water deprivations#

crawl4Waterdeprivation

  • read entries in eLabFTW (of resource category WaterDeprivation and locked!)

  • checks if entry was already crawled (DB entry)

  • loads the meta fields

  • pushes to DB: DB.ZWR

  • Creates animalsheet entry with some pre-filled fields:

    • procedure: f’Kontrolle und Beginn der Wasserdeprivation, ZWR:{ZWR}’

Crawling for animal sheets#

crawl4AnimalSheetEntries

  • read entries in eLabFTW (of resource category AnimalDocumentation and locked!)

  • checks if entry was already crawled (DB entry)

  • loads the meta fields

  • pushes to DB DB.AnimalSheetEntry

Crawling for surgeries#

crawl4Surgeries

  • read entries in eLabFTW (of resource category Surgery and locked!)

  • checks if entry was already crawled (DB entry)

  • load the meta fields

  • check:

    • surgery date is entered

    • animal_nr is entered and is int

    • age at delivery is a float

    • user entered

  • checks if entry is already inDB via animal_id and surgery datetime

  • pushed to DB DB.Surgery

  • logs idx as crawled

  • puts animal_id as tag

  • link animal to entry

  • puts tva as tag if avaible

  • creates an animalsheet entry in DB and eLabFTW (links to this entry)

Pushes#

At initialization the crawler can push current state of the DB to eLabFTW. The templates for individual entries are designed in a way that the fields have same names as in DB, this allows easy exchange via dictionaries.

Categories#

are created from classes ResourceCategories and ExperimentCategories, get_expCategory_id returns the category id or creates if not available yet. Individual categories are subclass of the corresponding class. This is done so to ensure consistent naming and to use those names as ENUMS.

Templates#

Templates can be designed in the browser or programatically Here we use classes like WaterDeprivationFields. In there subclasses represent individual meta fields to be entered. properties such as field type are set from FieldTypes. Important here: the class name needs to corrsepond to the field name in DB if it is to be filled or transfered to DB. the name displayed to user is defined in property name. The order of fields can be defined via property position (1 based). property required, tries to indicate to user that a fields is required but does not enforce it.

Pushing animals from DB to eLabFTW#

template AnimalFields from eLabFTW_api_utils pushAnimalsDB2ELN - function pushes the animal entries which are in the DB to eLabFTW:

  • reads animals from DB

  • reads animals from eLabFTW (Resources of category Animals) consideres the title of each entry as animal_id

  • checks whether the animal is already in eLabFTW

  • fills the template fields from DB

Pushing viruses from DB to eLabFTW#

template VirusFields from eLabFTW_api_utils pushVirusDB2ELN - function pushes the viruses entries which are in the DB to eLabFTW:

  • reads viruses from DB

  • reads viruses from eLabFTW (Resources of category Virus) reads virus and batch via title f”{virus[‘virus’]}#{virus[‘batch_nr’]}”

  • checks whether the virus is already in eLabFTW

  • fills the template fields from DB

written by: Artur
last modified: 2024-02-22