英格兰男子职业足球联赛数据及预测模型

admin 103c20cc6b 添加分析器骨架,包括团队提取、Elo、Dixon-Coles、H2H、马尔可夫链、蒙特卡洛和强度分析器;实现基本结构和文档,支持未来的功能扩展和数据处理。 2 ay önce
ops db7d006eb1 添加 Databank 项目的初始结构,包括抽象类和核心模型,配置系统服务和定时器,更新项目元数据和文档。 3 ay önce
scripts 103c20cc6b 添加分析器骨架,包括团队提取、Elo、Dixon-Coles、H2H、马尔可夫链、蒙特卡洛和强度分析器;实现基本结构和文档,支持未来的功能扩展和数据处理。 2 ay önce
src 103c20cc6b 添加分析器骨架,包括团队提取、Elo、Dixon-Coles、H2H、马尔可夫链、蒙特卡洛和强度分析器;实现基本结构和文档,支持未来的功能扩展和数据处理。 2 ay önce
tests 103c20cc6b 添加分析器骨架,包括团队提取、Elo、Dixon-Coles、H2H、马尔可夫链、蒙特卡洛和强度分析器;实现基本结构和文档,支持未来的功能扩展和数据处理。 2 ay önce
.gitignore ff2f1d2be1 添加 DailyFileReporter 实现,支持按天记录日志并归档,更新示例脚本以展示集成用法。 2 ay önce
LICENSE 308002100f Initial commit 3 ay önce
README.md ff2f1d2be1 添加 DailyFileReporter 实现,支持按天记录日志并归档,更新示例脚本以展示集成用法。 2 ay önce
pyproject.toml 0c27ee2645 添加 MongoDB 后端实现,配置和引导工具,示例脚本,以及更新文档和抽象类,增强项目结构和可用性。 3 ay önce

README.md

Databank (Abstract Skeleton)

Policy: scripts are executable, examples are stubs

  • All runnable entry points live under scripts/ and can be executed directly with python.
  • Files under src/databank/examples/ are stubs for guidance/docs only and will print instructions.
  • Prefer running demos via scripts:
    • python scripts/seed_leagues_mongo.py
    • python scripts/seed_seasons_mongo.py
    • python scripts/test_get_league_match_list.py
    • python scripts/reporter_demo.py

This repository is a pure abstract skeleton intended to define stable contracts for a multi-spider data pipeline. It deliberately contains only abstract/base classes and core models, without any concrete implementations.

Key modules (all abstract-only):

  • spiders: BaseSpider with clear lifecycle hooks and advisory attributes.
  • db: BaseDB defining minimal persistence operations and hooks.
  • reporter: BaseReporter defining reporting lifecycle.
  • scheduler: RunnerBase and SchedulerBase for coordination/scheduling.
  • analytics: AnalyticsBase for generic analytics pipelines.

Guidelines to extend (no code here, only how-to):

  • Implementations MUST live in your own packages/modules and import these bases.
  • Do NOT modify the base interfaces unless you intend a breaking change.
  • Prefer composition and dependency injection over hard-coding dependencies.

Implementing a spider (outline only):

  1. Subclass BaseSpider and implement:
    • build_payload(url) -> Payload
    • fetch(url, payload) -> str
    • parse(url, content, payload) -> Documents
  2. Optionally override hooks:
    • on_run_start/on_run_end, should_fetch, before_fetch/after_fetch, transform, handle_error, close.
  3. Optionally honor advisory attributes like max_retries, request_timeout_s.

Implementing a DB backend (outline only):

  1. Subclass BaseDB and implement: connect, ensure_indexes, insert_many, close.
  2. Optionally override hooks: on_connect/on_close, before_insert/after_insert.

Implementing a reporter (outline only):

  1. Subclass BaseReporter and implement: notify_start, notify_success, notify_error, notify_summary.
  2. Optionally override on_session_start/on_session_end.

Implementing a runner/scheduler (outline only):

  1. Subclass RunnerBase to coordinate spiders -> DB -> reporters.
  2. Subclass SchedulerBase to install/trigger schedules (e.g., via systemd/cron in your own code).

Implementing analytics (outline only):

  1. Subclass AnalyticsBase and implement compute(data, **kwargs).
  2. Optional staged hooks: prepare, validate, transform, finalize.

Operations: systemd templates

  • See ops/systemd/databank.service and ops/systemd/databank.timer.
  • Customize User, WorkingDirectory, and ExecStart for your environment.

Optional linting (no deps enforced)

  • A minimal Pylint config is included in pyproject.toml under [tool.pylint.*].
  • You can run Pylint in your environment if desired, for example:
    • pylint src/databank (assuming Pylint is installed in your environment)
    • The config disables ABC-related false positives while keeping docstring checks.

Optional typing and linting (ruff/mypy)

  • Minimal configs for Ruff and mypy are also included in pyproject.toml.
  • If you have them installed locally, example commands:
    • ruff check src/databank
    • mypy src/databank
    • Both are optional and will not run unless you invoke them.

License

  • See LICENSE for details.

Abstract-safe initialization helpers

  • databank.config.settings:
    • DBSettings: 通用数据库设置容器(不绑定具体后端)。
    • load_db_settings(prefix="DATABANK_DB_"): 从环境变量读取设置(如 DATABANK_DB_URIDATABANK_DB_NAME 等)。
    • settings_to_options(settings): 将 DBSettings 转换为通用 configure(**options) 所需字典。
    • merge_options(base, extra): 合并两份 options(右侧覆盖)。
  • databank.bootstrap.db:
    • DBBootstrapOptions: 启动选项(是否 connectensure_indexes,以及 configure_options)。
    • bootstrap_db(db, options): 以抽象方式调用 configureconnectensure_indexes
    • db_session(db, options): 上下文管理器,产出连接后的 DB,并在退出时安全关闭。

示例(仅展示编排,不包含具体后端实现):

from databank import config, bootstrap

# 假设你有一个自定义的 DB 实现 `MyDB(BaseDB)`,此处仅示意。
from mypkg.db import MyDB  # 你的实现,不在本仓库内

settings = config.load_db_settings()
options = config.settings_to_options(settings)

db = MyDB()
boot = bootstrap.DBBootstrapOptions(configure_options=options, connect=True, ensure_indexes=True)

with bootstrap.db_session(db, boot) as conn:
		# 在此使用 conn.insert_many([...]) 等抽象方法
		pass

以上编排层不引入任何具体驱动或后端,仅依赖于 BaseDB 约定,便于后续在你自己的实现中复用。

Reporter 说明(DailyFileReporter)

  • 归档结构:{log_dir}/YYYY/MM/report_{YYYY-MM-DD}.log
  • 时区:默认 UTC+8(可选 utclocalutc+/-H[.m]
  • 示例:运行 python scripts/reporter_demo.py,在 ./logs/年/月/ 下查看日志文件。