github backup

原因

个人对任何平台采用零信任策略,特别是github曾违背中立详见,虽然现在不少账号已经恢复,但不难保证自己的代码以后是否被误伤,以及墙可能加强

决策&问题

  • 对于本地定期任务(如cron等)还是github等平台的webhook,其实都可以。其中后者能追踪到每一次commit,但是需要对每个仓库进行设置,而前者虽然会有阶段性缺失,但是对我个人仓库而言影响不大
  • 经过初步调研,市场上的备份工具,大多数是依赖第三方平台拉取到本地,但是涉及到密钥等隐私性信息,安全性无法保证
  • 需要自己编写脚本,具体流程为“(增量)拉取到本地 -(上传到其他平台)”,暂时不做上传其他平台的原因主要是备份平台无法确定(有点断层,简单罗列下原因)
    • gitee:之前国内的商业化过重,个人不喜欢,最近一段时间没有尝试过
    • gitlab:国内分流为jihulab,但是只有90天试用
    • 自建(gitlab、gitea):以后可尝试
    • 其他小平台就忽略了,安全性无法保障
  • github所提供的api只找到读取public repo的,对于private repo有待进一步解决

脚本参考

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
import os
import requests
import json
import logging
from datetime import datetime
from config_loader import read_config_byconfigparser # config_loader为自己封装的配置读取器

backup_directory = read_config_byconfigparser('PATH','backup_directory')
log_directory = read_config_byconfigparser('PATH','log_directory')

filename = os.path.basename(__file__)
sub_id = read_config_byconfigparser(filename,'sub_id')
backup_directory = os.path.join(backup_directory, sub_id)

if not os.path.exists(log_directory):
os.makedirs(log_directory)

log_file = os.path.join(log_directory, sub_id)
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(levelname)s - %(message)s',
filename=log_file
)

if not os.path.exists(backup_directory):
os.makedirs(backup_directory)
logging.info(f"create directory '{backup_directory}' for backup")

GITHUB_TOKEN = read_config_byconfigparser(filename,'github_token')
GITHUB_USER = read_config_byconfigparser(filename,'github_username')

response = requests.get("https://api.github.com")
RESPONSE_TIME = response.elapsed.total_seconds()
logging.info(f"GitHub API response time: {RESPONSE_TIME} seconds")

THRESHOLD = read_config_byconfigparser(filename,'test_threshold')
if RESPONSE_TIME > THRESHOLD:
logging.error(f"Response time is too high ({RESPONSE_TIME} > {THRESHOLD}), skipping backup.")
exit(1)

logging.info(f"Starting backup for user {GITHUB_USER} at {datetime.now()}")
headers = {"Authorization": f"token {GITHUB_TOKEN}"}
response = requests.get(f"https://api.github.com/users/{GITHUB_USER}/repos", headers=headers)

if response.status_code != 200:
logging.error(f"Failed to fetch repository list. HTTP response code: {response.status_code}")
exit(1)

repos = response.json()
exclude_repos = read_config_byconfigparser(filename,'exclude_repos')
for repo in repos:
repo_name = repo['name']
if repo_name in exclude_repos:
logging.info(f"Skipping repository {repo_name} as it is in exclude list.")
continue
clone_dir = os.path.join(backup_directory, repo_name)

if os.path.isdir(clone_dir):
logging.info(f"Repository {repo_name} already exists, updating all branches.")
os.chdir(clone_dir)
if os.system("git fetch --all") != 0:
logging.error(f"Failed to fetch all branches for {repo_name}.")
if os.system("git pull --all") != 0:
logging.error(f"Failed to update all branches for {repo_name}.")
os.chdir(backup_directory)
continue
logging.info(f"Cloning all branches of {repo['clone_url']}")
if os.system(f"git clone {repo['clone_url']} {clone_dir}") != 0:
logging.error(f"Failed to clone all branches of {repo['clone_url']}.")

logging.info(f"Backup completed for user {GITHUB_USER} at {datetime.now()}")
logging.shutdown()