对股民来说,上证指数的涨跌关系着整个市场情绪,在大盘上涨的大前提下操作个股,风险才可控,所以对上证指数的涨跌进行预测,是很有意义的。本文介绍用 Pytorch 实现CNN网络,来预测上证指数完整过程。
代码示例
1、获取大盘数据
都说tushare包,但是获取大盘数据需要积分等级,收费就收费,免费就免费,搞个积分门槛,很讨厌,所以使用替代方案:baostock。
文档:http://baostock.com/baostock/index.php/%E6%8C%87%E6%95%B0%E6%95%B0%E6%8D%AE
import baostock as bs import pandas as pd import os def kdata_df(): file_path = './kline_data.csv' if not os.path.exists(file_path): lg = bs.login() rs = bs.query_history_k_data_plus("sh.000001", "date,code,open,high,low,close,preclose,volume,amount,pctChg", start_date='2018-01-01', end_date='2022-03-29', frequency="d") bs.logout() data_list = [] while (rs.error_code == '0') & rs.next(): # 获取一条记录,将记录合并在一起 data_list.append(rs.get_row_data()) df = pd.DataFrame(data_list, columns=rs.fields) df['pred_close'] = df['close'].shift(-1) # 结果集输出到csv文件 df.to_csv(file_path, index=False) return pd.read_csv(file_path)
2、格式化数据
首先,按天为单位,获取前60个交易日的价格、成交量数据,构建矩阵作为输入数据。
然后,将后一天的收盘价,作为预测价格输出值。
def kdata_list(): df = kdata_df() lst = [] for index in range(59, len(df)): x_col = ['open', 'high', 'low', 'close', 'volume'] x_val = df.loc[index - 59:index, x_col].values.astype('float32') x_val = x_val.reshape(1, x_val.shape[0], x_val.shape[1]) y_val = df.loc[index, 'pred_close'].astype('float32') y_val = y_val.reshape(1) lst.append((x_val, y_val)) return lst
3、切分训练集和测试集
import torch.utils.data as data class SDataset(data.Dataset): def __init__(self, train=True): # 获取转化后的数据 lst = kdata_list() train_num = int(len(lst)*0.9) if train: self.kdata = lst[:train_num] else: self.kdata = lst[train_num:-1] def __len__(self): return len(self.kdata) def __getitem__(self, index): x = self.kdata[index][0] y = self.kdata[index][1] return x, y from torchvision import transforms train_ds = SDataset(transform=transforms.ToTensor()) train_loader = data.DataLoader(train_ds, batch_size=5, shuffle=True) for x,y in train_loader: print(x.shape) exit()
4、定义CNN模型
import torch.nn as nn class SModule(nn.Module): def __init__(self): super().__init__() self.conv1 = nn.Sequential( nn.Conv2d(1, 32, 3, 1, 1), #(1, 5, 60) nn.BatchNorm2d(32), nn.ReLU() #(32, 5, 60) ) self.conv2 = nn.Sequential( nn.Conv2d(32, 64, 3, 1, 1), #(32, 5, 60) nn.BatchNorm2d(64), nn.ReLU() #(64, 5, 60) ) self.out = nn.Linear(64 * 5 * 60, 1) def forward(self, x): x = self.conv1(x) x = self.conv2(x) x = x.reshape(x.shape[0], -1) x = self.out(x) return x module = SModule() # print(module)
5、模型训练
import torch loss_fn = nn.MSELoss() optimizer = torch.optim.Adam(module.parameters(), lr=0.05) train_ds = SDataset() train_loader = data.DataLoader(train_ds, batch_size=100, shuffle=True) for epoch in range(1000): min_loss = 1000 for x, y in train_loader: y_hat = module(x) loss = loss_fn(y_hat, y) print('epoch:', epoch, 'loss:', loss.item()) if float(loss.item()) < min_loss: torch.save(module, './m.pkl') optimizer.zero_grad() loss.backward() optimizer.step()
6、模型测试
module = torch.load('./m.pkl') test_ds = SDataset(train=False) test_loader = data.DataLoader(test_ds, batch_size=50) for x,y in test_loader: y_hat = module(x) loss = loss_fn(y_hat, y) for y1, y2 in zip(y, y_hat): print('真实值:', int(y1.item()), '预测值:', int(y2.item())) print('loss:', loss.item())
7、大盘预测
import torch module = torch.load('./m.pkl') lst = kdata_list() x = torch.tensor([lst[-1][0]]) y_hat = module(x) print('预测值:', y_hat.item())
测试结果
经过6个小时训练,50个batch的均方误差,徘徊在1000-3000之间,偶尔还会跳出个过万的,很难收敛,而且测试误差更是高得离谱。实践证明,凡是说能预测大盘指数的,都是神棍。
看来这么硬解的方法不太靠谱,后面会继续测试其他指标和方法,欢迎关注。
本文为 陈华 原创,欢迎转载,但请注明出处:http://edu.ichenhua.cn/read/246