OpenAI_New_API_Capacities

2023/11/27

这个博客主要是关于如何利用OpenAI 官方新推出的API的功能。关于最新功能的介绍。对应的项目链接可以参考GitHub链接。

email: 213193509seu@gmail.com

语言框架: python

参考链接: https://platform.openai.com/docs/guides

OpenAI;API

阅读全文

ScrapyStartUpBestPractice

2023/11/22

This blog tells you mainly about how to build your first scrapy project.

scrapy

阅读全文

导出项目依赖库为requirements.txt

2023/11/8

在开发Python项目的时候，我们常常需要导出一个Python项目的依赖文件为requirements.txt, 很多人尝试都是一个个逐一查找依赖，其实有库可以帮我们做这个。

下面介绍的这个库为pipreqs。

阅读全文

ElectronFristPractice

2023/11/3

This blog tells how you could build your own first electron app from scratch.

阅读全文

build_self_transformer

2023/11/3

This blog tells you how to build a transformers from scratch.

Transformer

阅读全文

Log-in-FastApi-best-practice

2023/10/31

This blog tells how you could set up your custom logging module in your Fast API app.

阅读全文

SearchDirectionDecision

2023/10/7

总览汇总

文章题目	本地文件位置	年份	发表刊物	解决问题	方法	关键创新点	数据集	代码	实现难易程度
Scalable Gradients for Stochastic Differential Equations	duval23a.pdf	2023	JMLR	1. 通过`Stochastic Frame Averaging`变换保留原子坐标的投影对称性，数据处理值得借鉴。 2. 然后结合`GNN`和`MLP`构建了神经网络，这部分并未很突出。 3. 取得较好的表现。	Stochastic Frame Averaging: PCA降维	对称保持的数据增强：研究者提出了一种新方法，使得GNN在处理数据时可以保持其原始的对称性。 FAENet：这是研究者开发的一个新的GNN模型。它的特点是可以自由地处理原子之间的位置关系，同时确保数据的对称性不被破坏。 FAENet分析：研究者测试了他们的新方法和新模型在几个材料科学数据集上的性能，并发现它比以前的方法更好。	OC20 dataset (S2EF, IS2RE)，(QM9, QM7-X)	1.FAENet: Frame Averaging Equivariant GNN for Materials modeling — faenet documentation 2. vict0rsch/faenet (github.com)	使用较为容易，修改需要熟悉代码，估计熟悉代码需要两周。
Flashlight: Scalable Link Prediction With Effective Decoders	wang22d.pdf	2022	PMLR	比较远的原子在多路信息传递的信息丢失和区分键角不同的结构。	1. frame construction 2. coordination projection 3.frame-frame projection	even ordinary GNN can encode molecule injectively and thus reach maximum expressivity with coordinate projection and frame-frame projection.	MD17	GraphPKU/GNN-LF (github.com)	代码结果较为简单，估计一周可以熟悉并复现。
Deep Potential Molecular Dynamics: a scalable model with the accuracy of quantum mechanics End-to-end Symmetry Preserving Inter-atomic Potential Energy Model for Finite and Extended Systems	NeurIPS	2017/2018	PRL/NeurIPS	Molecular modeling, Inter-atomic potential energy surface modeling	Deep Potential - Smooth Edition (DeepPot-SE)	Extensive, continuously differentiable, linear scalability, symmetry preservation	各种系统，包括高熵合金（基于DFT数据，阿里云数据已过期）	DeePMD-kit (github.com)	基于TensorFlow实现，使用较为方便，对其进行修改比较困难，不太熟悉TensorFlow框架。
Efficient determination of the Hamiltonian and electronic properties using graph neural network with complete local coordinates	Su_2023_	2023	MLST(machine learning: Science and Technology)	使用图神经网络（Graph Neural Network, GNN）架构和LC变换（LC transformation）构建原子体系与哈密顿量（由下面的hopping parameter给出）的关系，$h_{i \alpha, j \beta}^{\left(\mathbf{R}n\right)}=\left\langle\phi{i \alpha}\left(\mathbf{r}-\boldsymbol{\tau}_i\right)	\hat{H}	\phi_{j \beta}\left(\mathbf{r}-\boldsymbol{\tau}_j-\mathbf{R}_n\right)\right\rangle$	LC变换（LC transformation）， GNN	使用哈密顿量作为回归的`lable`。	Self Made dataset:Graphene and , zincblend SiGe
Cormorant：协变分子神经网络	PDF链接	2019	NeurIPS	开发名为Cormorant的神经网络架构，专为学习复杂多体物理系统的行为和性质。 2. 应用于分子系统，学习用于分子动力学模拟的原子势能面及由密度泛函理论计算的分子基态特性。 3. 确保网络的旋转不变性，增加网络的表达力。	输入特征化网络，只对原子电荷/身份和相对位置的标量函数进行操作。 2. 协变激活网络，每个激活都是$\mathrm{SO(3)}$-向量类型。 3. 顶部的旋转不变网络，从激活构造标量，用于预测回归目标。	Clebsch-Gordan非线性，实现激活中每个自由度的完全交互。 2. 确保网络的旋转和平移不变性，神经元实现的操作直接由已知的物理相互作用的形式激发。 3. 网络激活以球形张量形式（SO(3)–向量）表示，结合Clebsch–Gordan乘积和可学习权重的混合，所有操作都是协变操作。	MD-17数据集(学习分子力场和势能表面)， QM-9数据集(学习一组分子的基态性质, 前文使用的数据)	github链接：	重现代码较为容易，代码实现设计较为复杂的物理数学原理，修改可能比较困难。
Do Transformers Really Perform Bad for Graph Representation?		探讨Transformer在图表示学习中的表现，并提出Graphormer来提升性能	NeurIPS	Graphormer的提出，通过结构编码方法来优化Transformer对图结构数据的处理。	. 中心性编码：使用度中心性来为每个节点分配嵌入向量。 2. 空间编码：衡量节点间的最短路径距离，将其作为偏置项加到注意力矩阵中。 3. 边编码：计算边特征与可学习嵌入的点积的平均值，作为偏置项加到注意力模块中。	OGB Large-Scale Challenge (OGB-LSC) 中的 MAG240M, WikiKG90M, PCQM4M数据集 2. OGBG-MolPCBA, OGBG-MolHIV, ZINC(sub-set)		Graphormer on GitHub	具有较为完备的代码和教程，复现不困难，进行修改难度需要进一步评估。
E(3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials	链接	2022	Nature Communications	1. 加速分子动力学模拟的深度学习原子势方法的引入。2. 提高了分子和材料集合的准确性，同时展现了显著的数据效率。	Neural Equivariant Interatomic Potentials (NequIP)：E(3)-equivariant neural network 方法用于从ab-initio计算中学习分子动力学模拟的原子势。	E(3)-equivariant卷积：与大多数仅在标量上作用的当前对称感知模型不同，NequIP采用E(3)-equivariant卷积来处理几何张量的交互，从而更丰富、更真实地表示原子环境。	1. MD-17 Dataset 2. QM9 Dataset 3. ISO17 Dataset	http://github.com/mir-group/nequip	有较为完备的文档和项目以及实现，上手起来应该比较容易。

数据集汇总

OC20 Dataset (S2EF, IS2RE):
- The Open Catalyst 2020 Dataset (OC20) is used for catalysis in chemical engineering, with a focus on molecules significant in renewable energy applications. It includes over 1.3 million relaxations of molecular adsorptions onto surfaces, making it a substantial dataset for electrocatalyst structures.
- The dataset provides Bader charge data for all final frames in its training and validation systems. It’s organized in a .tar.gz file, which, when uncompressed, reveals several directories with unique system IDs. Each directory contains raw Bader charge analysis outputs (source).
- Data in OC20 is stored in PyTorch Geometric Data objects and saved in LMDB files, including several sized training splits for different tasks (source).
- It comprises 1,281,121 Density Functional Theory (DFT) relaxations across various materials, surfaces, and adsorbates (nitrogen, carbon, and oxygen chemistries) (source).
QM9 Dataset:
- QM9 provides quantum chemical properties for a relevant, consistent, and comprehensive chemical space of small organic molecules. It’s become a standard for machine learning predictions of various chemical properties.
- The dataset consists of about 130,000 molecules with 19 regression targets. Each molecule includes complete spatial information for the single low energy conformation of the atoms in the molecule (source).
- It contains 133,885 stable small organic molecules made up of CHONF (carbon, hydrogen, oxygen, nitrogen, and fluorine) and is publicly available for data-driven researches of material property prediction and chemical space exploration (source).
QM7-X Dataset:
- QM7-X is a comprehensive dataset with approximately 4.2 million equilibrium and non-equilibrium structures of small organic molecules. It provides 42 physicochemical properties for these molecules, which comprise up to seven non-hydrogen atoms (C, N, O, S, Cl).
- The dataset is organized into HDF5 files, and a script is provided to produce a database file named QM7X.db containing atomic position, atomic number, and the required physicochemical properties (source).
- It covers a broad range of physicochemical properties, making it a valuable resource for researchers interested in the quantum-mechanical properties of small organic molecules (source).
MD17 Dataset:

The MD17 (Molecular Dynamics 17) Dataset was introduced by Chmiela et al., focusing on energies and forces for molecular dynamics trajectories of eight small organic molecules.
The dataset has been used for the development and evaluation of machine-learned potential energy surfaces (PES). Each molecule in the database comprises tens of thousands of energies and forces obtained from DFT (Density Functional Theory) direct dynamics at 500 K. Notable examples of molecules included are ethanol, malonaldehyde, and glycine (source). -
A revised version of the dataset, known as rMD17, was introduced by Anders S. Christensen and O. Anatole von Lilienfeld in 2020. This version recalculated the energies and forces at a different level of theory, aiming to address the role of gradients in machine learning of molecular energies and forces. Moreover, a third version of the dataset was introduced, containing fewer molecules computed at the CCSD (T) (Coupled Cluster with Single and Double excitations) level of theory (source).
The original MD17 dataset contained numerical noise, which was noted and addressed in the revised version to ensure accuracy in machine learning benchmarks (source).
1. Self Made dataset: Graphene and , zincblend SiGe:
Graphene :

We perform molecular dynamics simulation of a 6 × 6 × 1 graphene structure for 5 ps to generate the dataset. sample 500 structures as the training set and the other 500 structures as the validation set
zincblend SiGe:

The SiGe random alloy dataset is generated by randomly occupying the zinc-blende lattice sites with the Si or Ge atoms. The number of possible combinations in a supercell with N sites is given by the combinatorial number C(N,N/2), which could be incredibly large as the total atom number increases.

ISO17 Dataset:
- Description: The ISO17 dataset, also known as “ISO17 - MD Trajectories of C7O2H10 with total energies and atomic forces,” is derived from a set of molecules from the QM9 dataset with a fixed composition of atoms (C7O2H10) in various chemically valid structures. These molecules were selected from the largest set of isomers in the QM9 dataset[^1^].
- Composition: The dataset contains:
- X (7165 x 23 x 23): Inputs (Coulomb matrices)
- T (7165): Labels (atomization energies)
- P (5 x 1433): Splits for cross-validation[^2^]
  
  Benchmark for Molecular Dynamics: ISO17 is a benchmark dataset for molecular dynamics of C7O2H10 isomers, including molecular forces[^3^].
  
  Extension of Isomer MD Data: The dataset is an extension of the isomer MD data used in prior research[^4^].
  
  References and Further Reading: - Quantum-Machine.org: Datasets - SchNetPack Documentation - kgcnn Documentation

术语说明

$ E(3) 3-dimensional Euclidean group,$

指的是欧几里得空间中的刚体变换（或称为欧几里得变换）。这种变换在三维空间中保持距离和角度不变。具体来说，( E(3) ) 变换由以下两部分组成：

旋转：这是一个保持原点不变的变换，它可以使物体绕某个轴旋转。

平移：这是一个将物体从一个位置移动到另一个位置的变换，而不改变物体的方向或形状。

This includes translations, rotations, and reflections. An E(3)-equivariant neural network is designed to respect these symmetries, meaning that its output will change appropriately with such transformations of the input data.

$PES$

输入原子坐标，得到能量和受力的任务。

Equivariance

For Equivariance in neural neural network, that means the output of the neural network can be predicted when the input data is transformed, or to say the output is also given in the same transformed way as the input day or in a correlation way.

具体论文

FAENET

论文阅读报告

论文标题

FAENet: Frame Averaging Equivariant GNN for Materials Modeling

主要贡献与目标

目标：提高材料属性预测模型的计算效率和预测能力。
方法：引入了一种新的框架和网络，即Stochastic Frame-Averaging (SFA)和FAENet，以在不丧失表达能力的情况下实现E(3)-等变性。
应用：这些方法适用于广泛的分子属性预测，并在OC20、S2EF、IS2RE、QM9和QM7-X等数据集上进行了实证验证。

关键方法与技术

**Stochastic Frame-Averaging (SFA)**：一种灵活的框架，通过将数据点投影到规范表示中，允许任何模型在理论上（Full FA）或经验上（Stochastic FA）实现E(3)-等变性，而不失去表达能力。
FAENet：一种轻量级但有效的GNN，其设计不受对称性保护要求的约束。FAENet可以通过原子相对位置处理几何信息，同时通过数据严格保持对称性，由FA提供支持。

主要挑战与问题

现有的基于密度泛函理论（DFT）的模型在计算上非常密集，限制了大量材料候选项的评估。
以前的方法在实现表达性和泛化能力的同时，可能在训练和推理方面计算成本很高。

方法论证与分析

理论验证：验证了所提出方法的理论属性，研究了其表达能力，并在四个众所周知的材料科学ML数据集上展示了其与先前方法相比在精度和可扩展性方面的优越权衡。
实证分析：在OC20 IS2RE、S2EF（2M）用于固态晶体结构建模，以及QM7-X和QM9用于分子建模的数据集上展示了其优越的精度与可扩展性权衡。

结论

该论文通过引入SFA和FAENet，提出了一种新的视角和方法，以数据投影的方式保持对称性，而不是通过架构约束。这些方法旨在创建表达能力强、健壮且计算上可扩展的模型，以便进行大规模的材料属性评估和预测。在多个数据集上的实证验证表明，这些方法在精度和计算可扩展性方面提供了优越的权衡。

GNN-LF

论文阅读报告

论文标题

Graph Neural Network with Local Frame for Molecular Potential Energy Surface

主要贡献与目标

目标：高效且准确地模拟分子势能面（PES）。
方法：引入一种新的局部帧方法来学习分子表示，并分析其表达能力。通过在一个框架上投影，将等变特征（如3D坐标）转换为不变特征，从而在不复杂化架构的情况下捕获几何信息，并从GNN设计中解耦对称性要求。
结果：尽管使用了一个简单的普通GNN架构，但模型实现了最先进的准确性，并且具有更高的可扩展性，与最高效的基线相比，仅需要大约30%的推理时间和10%的GPU内存。

方法论证与分析

GNN-LF模型：该模型为每个原子生成一个O(3)-等变框架，并将邻近原子的相对位置和框架投影到该框架上作为边特征。这允许一个普通的GNN在只有不变特征的图上工作，确保表达能力和更简单的架构。
局部帧：局部帧方法解耦了对称性要求，允许模型在不变表示空间中操作，必要时可以将其转换回等变预测。
表达性：作者证明，给定非退化帧，即使是普通的GNN也可以通过坐标投影和帧-帧投影注入地编码分子并达到最大的表达性。

挑战与问题

现有的GNN需要特殊的设计来捕获几何信息并满足对称性要求，导致架构复杂。
手动描述符模型由于硬编码的描述符而准确度较低，不能处理可变大小的分子或不同种类的原子。
现有的GNN模型在如何合并几何信息方面存在差异。一些使用仅旋转不变的几何特征，而其他一些利用与坐标变换变化的等变特征。

结论

该论文通过引入GNN和局部帧方法，提出了一种新的方法来模拟分子PES，允许捕获几何信息并从GNN设计中解耦对称性要求。所提出的模型，GNN-LF，使用简单的GNN架构，但实现了最先进的准确性，并提供了与高效基线相比更高的可扩展性。作者通过实验提供了理论证明和方法的优越性能和可扩展性的演示。

LC-NET

阅读全文

ChatGPTSecurity

2023/9/30

这篇博客主要介绍如何利用安全策略来进行渗透ChatGPT或进行操作。

ChatGPT安全性

阅读全文

MySqlInOneHour

2023/9/18

我这里采用的是采用docker来启动mysql。

为了解决这个问题，你可以选择以下方法之一：

设置一个 root 密码：

docker run -e MYSQL_ROOT_PASSWORD=my-secret-pw mysql

允许空密码（不推荐在生产环境中使用）：
```
docker run -e MYSQL_ALLOW_EMPTY_PASSWORD=yes mysql
```

使用随机密码：

docker run -e MYSQL_RANDOM_ROOT_PASSWORD=yes mysql

切换`mysql`

关系型数据库/非关系型数据库

mysql清屏：你可以使用 Ctrl + L 来清屏

MySQL常用命令

展示数据库

show databases;

+--------------------+
| Database           |
+--------------------+
| information_schema |
| mysql              |
| performance_schema |
| sys                |
+--------------------+
4 rows in set (0.00 sec)

创建数据库

比如说要创建一个名叫game的数据库：

create database game;

删除数据库

DROP DATABASE game;

这个数据库名是大小写敏感·的。

创建表

创建表之前需要先use该database

USE game;

CREATE TABLE player (
    id INT,
    name VARCHAR(100),
    level INT,
    exp INT,
    gold DECIMAL(10,2)
);

创建表结构。

展示表

DESC player;

修改表
添加新列

使用ALTER TABLE命令可以添加新列。例如，我们可以添加一个新列birth_date来存储玩家的生日：
```
ALTER TABLE player
ADD COLUMN birth_date DATE;
```
删除列

如果你想删除某个列，可以使用DROP COLUMN命令。例如，我们可以删除刚刚添加的birth_date列：
```
ALTER TABLE player
DROP COLUMN birth_date;
```
修改列

你可以使用MODIFY COLUMN命令来修改一个现有列的数据类型或其他属性。例如，我们可以将name列的长度从100修改为150：
```
ALTER TABLE player
MODIFY COLUMN name VARCHAR(150);
```
重命名列

如果你想重命名一个列，可以使用CHANGE COLUMN命令。例如，我们可以将exp列重命名为experience：
```
ALTER TABLE player
CHANGE COLUMN exp experience INT;
```
添加主键

为了确保id列中的每个值都是唯一的，我们可以将其设置为主键：
```
ALTER TABLE player
ADD PRIMARY KEY (id);
```
添加索引

为了提高查询速度，我们可以为某些列添加索引：
```
ALTER TABLE player
ADD INDEX idx_name (name);
```

插入数据

INSERT INTO player (id, name, level, exp, gold) VALUES (1, "你好", 1, 1, 1);
SELECT * FROM player;

更新表的数据

UPDATE player set level = 2 where name="你好";
SELECT * FROM player;

数据库的导入导出

1. 数据库的导出

要导出MySQL数据库，您可以使用mysqldump工具。以下是一个基本的命令示例，用于导出整个数据库到一个.sql文件：

mysqldump -u [username] -p[password] [database_name] > [filename].sql

[username]：MySQL的用户名。
[password]：该用户名的密码。注意，-p和密码之间没有空格。
[database_name]：您想要导出的数据库名称。
[filename].sql：您想要保存的文件名。

2. 数据库的导入

要导入一个.sql文件到MySQL数据库，您可以使用以下命令：

mysql -u [username] -p[password] [database_name] < [filename].sql

参数的意义与上面的导出命令相同。

3. 表连接

表连接是关系型数据库中的一个核心概念，它允许您从两个或多个表中基于某些相关列组合数据。以下是一些常见的连接类型：

内连接 (INNER JOIN): 返回两个表中都有匹配的行。
左连接 (LEFT JOIN 或 LEFT OUTER JOIN): 返回左表中的所有行，即使右表中没有匹配的行。
右连接 (RIGHT JOIN 或 RIGHT OUTER JOIN): 返回右表中的所有行，即使左表中没有匹配的行。
全连接 (FULL JOIN 或 FULL OUTER JOIN): 返回左表和右表中的所有行。
交叉连接 (CROSS JOIN): 返回左表和右表中所有可能的行组合。

举例说明

假设我们有两个表：employees 和 departments。

employees 表:

emp_id	emp_name	dept_id
1	Alice	10
2	Bob	20
3	Charlie	NULL

departments 表:

dept_id	dept_name
10	HR
20	Finance
30	Marketing

1. 内连接

SELECT emp_name, dept_name
FROM employees
INNER JOIN departments ON employees.dept_id = departments.dept_id;

结果:

emp_name	dept_name
Alice	HR
Bob	Finance

2. 左连接

SELECT emp_name, dept_name
FROM employees
LEFT JOIN departments ON employees.dept_id = departments.dept_id;

结果:

emp_name	dept_name
Alice	HR
Bob	Finance
Charlie	NULL

3. 右连接

SELECT emp_name, dept_name
FROM employees
RIGHT JOIN departments ON employees.dept_id = departments.dept_id;

结果:

emp_name	dept_name
Alice	HR
Bob	Finance
NULL	Marketing

4. 全连接

由于MySQL不直接支持FULL JOIN，但如果它支持，结果将是：

emp_name	dept_name
Alice	HR
Bob	Finance
Charlie	NULL
NULL	Marketing

5. 交叉连接

SELECT emp_name, dept_name
FROM employees
CROSS JOIN departments;

这会返回每个员工与每个部门的所有可能组合，总共9行。

希望这个例子更清晰地展示了不同连接类型的区别。

MySQL

阅读全文

计算机体系结构

2023/9/17

这篇博客主要用于描述计算机体系结构以及操作系统相关的知识。

计算机体系结构

阅读全文

LOADING

LinkedList's Blog

Learning notes

This is blog for Linkedlist771, mainly record some learning notes, thank you for your visit!

OpenAI_New_API_Capacities

ScrapyStartUpBestPractice

导出项目依赖库为requirements.txt

ElectronFristPractice

build_self_transformer

Log-in-FastApi-best-practice

SearchDirectionDecision

总览汇总

数据集汇总

术语说明

具体论文

FAENET

论文阅读报告

论文标题

主要贡献与目标

关键方法与技术

主要挑战与问题

方法论证与分析

相关工作

结论

GNN-LF

论文阅读报告

论文标题

主要贡献与目标

方法论证与分析

挑战与问题

结论

LC-NET

ChatGPTSecurity

MySqlInOneHour

切换mysql

关系型数据库/非关系型数据库

MySQL常用命令

插入数据

更新表的数据

数据库的导入导出

1. 数据库的导出

2. 数据库的导入

3. 表连接

举例说明

1. 内连接

2. 左连接

3. 右连接

4. 全连接

5. 交叉连接

计算机体系结构

切换`mysql`