|
|
2 miesięcy temu | |
|---|---|---|
| ops | 3 miesięcy temu | |
| scripts | 2 miesięcy temu | |
| src | 2 miesięcy temu | |
| tests | 2 miesięcy temu | |
| .gitignore | 2 miesięcy temu | |
| LICENSE | 3 miesięcy temu | |
| README.md | 2 miesięcy temu | |
| pyproject.toml | 3 miesięcy temu |
Policy: scripts are executable, examples are stubs
scripts/ and can be executed directly with python.src/databank/examples/ are stubs for guidance/docs only and will print instructions.python scripts/seed_leagues_mongo.pypython scripts/seed_seasons_mongo.pypython scripts/test_get_league_match_list.pypython scripts/reporter_demo.pyThis repository is a pure abstract skeleton intended to define stable contracts for a multi-spider data pipeline. It deliberately contains only abstract/base classes and core models, without any concrete implementations.
Key modules (all abstract-only):
BaseSpider with clear lifecycle hooks and advisory attributes.BaseDB defining minimal persistence operations and hooks.BaseReporter defining reporting lifecycle.RunnerBase and SchedulerBase for coordination/scheduling.AnalyticsBase for generic analytics pipelines.Guidelines to extend (no code here, only how-to):
Implementing a spider (outline only):
BaseSpider and implement:
build_payload(url) -> Payloadfetch(url, payload) -> strparse(url, content, payload) -> Documentson_run_start/on_run_end, should_fetch, before_fetch/after_fetch, transform, handle_error, close.max_retries, request_timeout_s.Implementing a DB backend (outline only):
BaseDB and implement: connect, ensure_indexes, insert_many, close.on_connect/on_close, before_insert/after_insert.Implementing a reporter (outline only):
BaseReporter and implement: notify_start, notify_success, notify_error, notify_summary.on_session_start/on_session_end.Implementing a runner/scheduler (outline only):
RunnerBase to coordinate spiders -> DB -> reporters.SchedulerBase to install/trigger schedules (e.g., via systemd/cron in your own code).Implementing analytics (outline only):
AnalyticsBase and implement compute(data, **kwargs).prepare, validate, transform, finalize.Operations: systemd templates
ops/systemd/databank.service and ops/systemd/databank.timer.User, WorkingDirectory, and ExecStart for your environment.Optional linting (no deps enforced)
pyproject.toml under [tool.pylint.*].pylint src/databank (assuming Pylint is installed in your environment)Optional typing and linting (ruff/mypy)
pyproject.toml.ruff check src/databankmypy src/databankLicense
LICENSE for details.Abstract-safe initialization helpers
databank.config.settings:
DBSettings: 通用数据库设置容器(不绑定具体后端)。load_db_settings(prefix="DATABANK_DB_"): 从环境变量读取设置(如 DATABANK_DB_URI、DATABANK_DB_NAME 等)。settings_to_options(settings): 将 DBSettings 转换为通用 configure(**options) 所需字典。merge_options(base, extra): 合并两份 options(右侧覆盖)。databank.bootstrap.db:
DBBootstrapOptions: 启动选项(是否 connect、ensure_indexes,以及 configure_options)。bootstrap_db(db, options): 以抽象方式调用 configure→connect→ensure_indexes。db_session(db, options): 上下文管理器,产出连接后的 DB,并在退出时安全关闭。示例(仅展示编排,不包含具体后端实现):
from databank import config, bootstrap
# 假设你有一个自定义的 DB 实现 `MyDB(BaseDB)`,此处仅示意。
from mypkg.db import MyDB # 你的实现,不在本仓库内
settings = config.load_db_settings()
options = config.settings_to_options(settings)
db = MyDB()
boot = bootstrap.DBBootstrapOptions(configure_options=options, connect=True, ensure_indexes=True)
with bootstrap.db_session(db, boot) as conn:
# 在此使用 conn.insert_many([...]) 等抽象方法
pass
以上编排层不引入任何具体驱动或后端,仅依赖于 BaseDB 约定,便于后续在你自己的实现中复用。
Reporter 说明(DailyFileReporter)
{log_dir}/YYYY/MM/report_{YYYY-MM-DD}.logUTC+8(可选 utc、local、utc+/-H[.m])python scripts/reporter_demo.py,在 ./logs/年/月/ 下查看日志文件。