|
|
il y a 3 mois | |
|---|---|---|
| ops | il y a 3 mois | |
| src | il y a 3 mois | |
| .gitignore | il y a 3 mois | |
| LICENSE | il y a 3 mois | |
| README.md | il y a 3 mois | |
| pyproject.toml | il y a 3 mois |
This repository is a pure abstract skeleton intended to define stable contracts for a multi-spider data pipeline. It deliberately contains only abstract/base classes and core models, without any concrete implementations.
Key modules (all abstract-only):
BaseSpider with clear lifecycle hooks and advisory attributes.BaseDB defining minimal persistence operations and hooks.BaseReporter defining reporting lifecycle.RunnerBase and SchedulerBase for coordination/scheduling.AnalyticsBase for generic analytics pipelines.Guidelines to extend (no code here, only how-to):
Implementing a spider (outline only):
BaseSpider and implement:
build_payload(url) -> Payloadfetch(url, payload) -> strparse(url, content, payload) -> Documentson_run_start/on_run_end, should_fetch, before_fetch/after_fetch, transform, handle_error, close.max_retries, request_timeout_s.Implementing a DB backend (outline only):
BaseDB and implement: connect, ensure_indexes, insert_many, close.on_connect/on_close, before_insert/after_insert.Implementing a reporter (outline only):
BaseReporter and implement: notify_start, notify_success, notify_error, notify_summary.on_session_start/on_session_end.Implementing a runner/scheduler (outline only):
RunnerBase to coordinate spiders -> DB -> reporters.SchedulerBase to install/trigger schedules (e.g., via systemd/cron in your own code).Implementing analytics (outline only):
AnalyticsBase and implement compute(data, **kwargs).prepare, validate, transform, finalize.Operations: systemd templates
ops/systemd/databank.service and ops/systemd/databank.timer.User, WorkingDirectory, and ExecStart for your environment.Optional linting (no deps enforced)
pyproject.toml under [tool.pylint.*].pylint src/databank (assuming Pylint is installed in your environment)Optional typing and linting (ruff/mypy)
pyproject.toml.ruff check src/databankmypy src/databankLicense
LICENSE for details.