关于Transformer的理解

news/2024/7/19 10:43:23 标签: transformer, 深度学习, 人工智能

 关于Transformer,  QKV的意义表示其更像是一个可学习的查询系统,或许以前搜索引擎的算法就与此有关或者某个分支的搜索算法与此类似。

 


Can anyone help me to understand this image? - #2 by J_Johnson - nlp - PyTorch Forums

Embeddings - these are learnable weights where each token(token could be a word, sentence piece, subword, character, etc) are converted into a vector, say, with 500 values between 0 and 1 that are trainable.

Positional Encoding - for each token, we want to inform the model where it’s located, orderwise. This is because linear layers are not ideal for handling sequential information. So we manually pass this in by adding a vector of sine and cosine values on the first 2 elements in the embedding vector.

This sequence of vectors goes through an attention layer, which basically is like a learnable digitized database search function with keys, queries and values. In this case, we are “searching” for the most likely next token.

The Feed Forward is just a basic linear layer, but is applied across each embedding in the sequence separately(i.e. 3 dim tensor instead of 2 dim).

Then the final Linear layer is where we want to get out our predicted next token in the form of a vector of probabilities, which we apply a softmax to put the values in the range of 0 to 1.

There are two sides because when that diagram was developed, it was being used in language translations. But generative language models for next token prediction just use the Transformer decoder and not the encoder.

Here is a PyTorch tutorial that might help you go through how it works.

Language Modeling with nn.Transformer and torchtext — PyTorch Tutorials 2.0.1+cu117 documentation



http://www.niftyadmin.cn/n/4926703.html

相关文章

python采集淘宝整店商品 json格式

竞争优势:通过采集淘宝整店商品,可以获取到同一行业或同一类别的竞争对手的商品信息。这使得你可以更好地了解市场上的产品,了解竞争对手的定价、销售策略和产品特点,从而更好地制定自己的营销策略和定价策略。在竞争激烈的市场中…

【leetcode】509. 斐波那契数(easy)

斐波那契数 (通常用 F(n) 表示)形成的序列称为 斐波那契数列 。该数列由 0 和 1 开始,后面的每一项数字都是前面两项数字的和。也就是: F(0) 0,F(1) 1 F(n) F(n - 1) F(n - 2),其中 n > 1 解答&…

【BASH】回顾与知识点梳理(十四)

【BASH】回顾与知识点梳理 十四 十四. 文件与目录的默认权限与隐藏权限14.1 文件预设权限:umaskumask 的利用与重要性:专题制作 14.2 文件隐藏属性chattr (配置文件案隐藏属性)lsattr (显示文件隐藏属性) 14.3 文件特殊权限: SUID, SGID, SBI…

分糖果、、

描述 10个小孩围成一圈分糖果,老师分给第1个小孩10块,第2个小孩2块,第3个小孩8块,第4个小孩22块,第5个小孩16块,第6个小孩4块,第7个小孩10块,第8个小孩6块,第9个小孩14块,第10个小孩20块。然后所有的小孩同时将手中的糖分一半给右边的小孩,糖块为奇数的…

跟禹神VUE——组件间的通信方式(props配置项、组件间自定义事件、全局事件总线、消息订阅与发布、VUEX)

一、通过props配置项传递数据&#xff08;适用于父组件给子组件传递数据&#xff09; 父组件向子组件传递数据&#xff1a; 父组件代码&#xff1a;在子组件的标签中传递数据 <template><div><h2>学校名称&#xff1a;{{schoolName}}</h2><!-- 方…

多语言商城--外贸跨境电商多商户系统快速搭建

搭建一个多语言商城是外贸跨境电商多商户系统的一项重要任务。下面是搭建多语言商城的步骤和注意事项。 第一步&#xff1a;选择合适的电商平台 选择一个功能强大的电商平台&#xff0c;支持多语言功能&#xff0c;来满足多商户系统的需求。 第二步&#xff1a;选择合适的多语…

外部节点访问 k8s 集群内的 starrocks

问题描述 用kubeadm在虚拟机搭建了k8s&#xff0c;按starrocks官网步骤&#xff0c;用k8s部署了starrocks 部署成功&#xff1a; 在 k8s集群内节点访问到 sr&#xff1a;&#xff08;通过 clusterIP &#xff09; mysql -h 10.97.182.109 -uroot -P 9030 k8s 节点内访问成功…

MacOS安装RabbitMQ

官网地址&#xff1a; RabbitMQ: easy to use, flexible messaging and streaming — RabbitMQ 一、brew安装 brew update #更新一下homebrew brew install rabbitmq #安装rabbitMQ 安装结果&#xff1a; > Caveats > rabbitmq Management Plugin enabled by defa…