英格兰男子职业足球联赛数据及预测模型

admin db7d006eb1 添加 Databank 项目的初始结构，包括抽象类和核心模型，配置系统服务和定时器，更新项目元数据和文档。		5 месяцев назад
ops	db7d006eb1 添加 Databank 项目的初始结构，包括抽象类和核心模型，配置系统服务和定时器，更新项目元数据和文档。	5 месяцев назад
src	db7d006eb1 添加 Databank 项目的初始结构，包括抽象类和核心模型，配置系统服务和定时器，更新项目元数据和文档。	5 месяцев назад
.gitignore	308002100f Initial commit	5 месяцев назад
LICENSE	308002100f Initial commit	5 месяцев назад
README.md	db7d006eb1 添加 Databank 项目的初始结构，包括抽象类和核心模型，配置系统服务和定时器，更新项目元数据和文档。	5 месяцев назад
pyproject.toml	db7d006eb1 添加 Databank 项目的初始结构，包括抽象类和核心模型，配置系统服务和定时器，更新项目元数据和文档。	5 месяцев назад

Databank (Abstract Skeleton)

This repository is a pure abstract skeleton intended to define stable contracts for a multi-spider data pipeline. It deliberately contains only abstract/base classes and core models, without any concrete implementations.

Key modules (all abstract-only):

spiders: BaseSpider with clear lifecycle hooks and advisory attributes.
db: BaseDB defining minimal persistence operations and hooks.
reporter: BaseReporter defining reporting lifecycle.
scheduler: RunnerBase and SchedulerBase for coordination/scheduling.
analytics: AnalyticsBase for generic analytics pipelines.

Guidelines to extend (no code here, only how-to):

Implementations MUST live in your own packages/modules and import these bases.
Do NOT modify the base interfaces unless you intend a breaking change.
Prefer composition and dependency injection over hard-coding dependencies.

Implementing a spider (outline only):

Subclass BaseSpider and implement:
- build_payload(url) -> Payload
- fetch(url, payload) -> str
- parse(url, content, payload) -> Documents
Optionally override hooks:
- on_run_start/on_run_end, should_fetch, before_fetch/after_fetch, transform, handle_error, close.
Optionally honor advisory attributes like max_retries, request_timeout_s.

Implementing a DB backend (outline only):

Subclass BaseDB and implement: connect, ensure_indexes, insert_many, close.
Optionally override hooks: on_connect/on_close, before_insert/after_insert.

Implementing a reporter (outline only):

Subclass BaseReporter and implement: notify_start, notify_success, notify_error, notify_summary.
Optionally override on_session_start/on_session_end.

Implementing a runner/scheduler (outline only):

Subclass RunnerBase to coordinate spiders -> DB -> reporters.
Subclass SchedulerBase to install/trigger schedules (e.g., via systemd/cron in your own code).

Implementing analytics (outline only):

Subclass AnalyticsBase and implement compute(data, **kwargs).
Optional staged hooks: prepare, validate, transform, finalize.

Operations: systemd templates

See ops/systemd/databank.service and ops/systemd/databank.timer.
Customize User, WorkingDirectory, and ExecStart for your environment.

Optional linting (no deps enforced)

A minimal Pylint config is included in pyproject.toml under [tool.pylint.*].
You can run Pylint in your environment if desired, for example:
- pylint src/databank (assuming Pylint is installed in your environment)
- The config disables ABC-related false positives while keeping docstring checks.

Optional typing and linting (ruff/mypy)

Minimal configs for Ruff and mypy are also included in pyproject.toml.
If you have them installed locally, example commands:
- ruff check src/databank
- mypy src/databank
- Both are optional and will not run unless you invoke them.

License

See LICENSE for details.

README.md

Databank (Abstract Skeleton)