英格兰男子职业足球联赛数据及预测模型

admin db7d006eb1 添加 Databank 项目的初始结构,包括抽象类和核心模型,配置系统服务和定时器,更新项目元数据和文档。 3 месяцев назад
ops db7d006eb1 添加 Databank 项目的初始结构,包括抽象类和核心模型,配置系统服务和定时器,更新项目元数据和文档。 3 месяцев назад
src db7d006eb1 添加 Databank 项目的初始结构,包括抽象类和核心模型,配置系统服务和定时器,更新项目元数据和文档。 3 месяцев назад
.gitignore 308002100f Initial commit 3 месяцев назад
LICENSE 308002100f Initial commit 3 месяцев назад
README.md db7d006eb1 添加 Databank 项目的初始结构,包括抽象类和核心模型,配置系统服务和定时器,更新项目元数据和文档。 3 месяцев назад
pyproject.toml db7d006eb1 添加 Databank 项目的初始结构,包括抽象类和核心模型,配置系统服务和定时器,更新项目元数据和文档。 3 месяцев назад

README.md

Databank (Abstract Skeleton)

This repository is a pure abstract skeleton intended to define stable contracts for a multi-spider data pipeline. It deliberately contains only abstract/base classes and core models, without any concrete implementations.

Key modules (all abstract-only):

  • spiders: BaseSpider with clear lifecycle hooks and advisory attributes.
  • db: BaseDB defining minimal persistence operations and hooks.
  • reporter: BaseReporter defining reporting lifecycle.
  • scheduler: RunnerBase and SchedulerBase for coordination/scheduling.
  • analytics: AnalyticsBase for generic analytics pipelines.

Guidelines to extend (no code here, only how-to):

  • Implementations MUST live in your own packages/modules and import these bases.
  • Do NOT modify the base interfaces unless you intend a breaking change.
  • Prefer composition and dependency injection over hard-coding dependencies.

Implementing a spider (outline only):

  1. Subclass BaseSpider and implement:
    • build_payload(url) -> Payload
    • fetch(url, payload) -> str
    • parse(url, content, payload) -> Documents
  2. Optionally override hooks:
    • on_run_start/on_run_end, should_fetch, before_fetch/after_fetch, transform, handle_error, close.
  3. Optionally honor advisory attributes like max_retries, request_timeout_s.

Implementing a DB backend (outline only):

  1. Subclass BaseDB and implement: connect, ensure_indexes, insert_many, close.
  2. Optionally override hooks: on_connect/on_close, before_insert/after_insert.

Implementing a reporter (outline only):

  1. Subclass BaseReporter and implement: notify_start, notify_success, notify_error, notify_summary.
  2. Optionally override on_session_start/on_session_end.

Implementing a runner/scheduler (outline only):

  1. Subclass RunnerBase to coordinate spiders -> DB -> reporters.
  2. Subclass SchedulerBase to install/trigger schedules (e.g., via systemd/cron in your own code).

Implementing analytics (outline only):

  1. Subclass AnalyticsBase and implement compute(data, **kwargs).
  2. Optional staged hooks: prepare, validate, transform, finalize.

Operations: systemd templates

  • See ops/systemd/databank.service and ops/systemd/databank.timer.
  • Customize User, WorkingDirectory, and ExecStart for your environment.

Optional linting (no deps enforced)

  • A minimal Pylint config is included in pyproject.toml under [tool.pylint.*].
  • You can run Pylint in your environment if desired, for example:
    • pylint src/databank (assuming Pylint is installed in your environment)
    • The config disables ABC-related false positives while keeping docstring checks.

Optional typing and linting (ruff/mypy)

  • Minimal configs for Ruff and mypy are also included in pyproject.toml.
  • If you have them installed locally, example commands:
    • ruff check src/databank
    • mypy src/databank
    • Both are optional and will not run unless you invoke them.

License

  • See LICENSE for details.