Skip to content

Latest commit

 

History

History
53 lines (45 loc) · 2.16 KB

dataPrepare.md

File metadata and controls

53 lines (45 loc) · 2.16 KB

crawl data

Most of the data are crawled from 天天基金网.

python src/crawlFundData.py crawlAllFundData --ifCrawlBasicInformation=True --ifCrawlPortfolio=True --ifCrawlHistoricalValue=True

You can set any arg be False to ignore some informations, all commands are exectued in the root folder of this repo.

data analyze

analyze historical value

Use below commands to get the return and risk in 3 years for all funds.

python src/analyzeData.py analyzeHistoricalValue --ifUseNewIssues=True --ifUseOldIssues=True --ifUseWatchList=False --ifUseAdjustFactorToLatestDay=False --ifPrintFundCode=False

The result: risk_return_noWatchlist_useNewIssues_useOldIssues_notUseAdjustFactor

quantitively analyze

Catogorize the return and risk in near days.

python src/analyzeData.py getAverageSlopeForFundsInSameRange --ifUseAdjustFactorToLatestDay=False

We can get the average of annualized return, it seems the average return varies in different time. averageReturn_30_notUseAdjustFactor

fund managers tend to use similar strategy

We can use Pearson's correlation method to get the correlation between fund '110011' and other funds.

python src/analyzeData.py getCorrelationMatrixForOneFund --fundCodeToAnalyze=110011

If intermediate file are generated, we can set related flags True. correlation_110011

confirm it in all funds

I analyze the Pearsom's correlation matrix for all funds.

python src/analyzeData.py getCorrelationMatrixForAllFunds

maximum_correlation

Cosine between portfolio of two funds

Use cosine between two vectors in this matrix to represent the correlation of two funds.

python src/analyzeData.py analyzeCosineForOneFund --nameFund=110011

cosine_110011

Get the relation between cosine relation and Pearson's correlation.

python src/analyzeData.py compareCosineAndPearsonCorr --nameFund '110011'

cosine_PearsonCorr